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GLOBAL ANALYSIS OF TRANSPOSABLE ELEMENTS AS MOLECULAR 

MARKERS OF CANCER 



This application claims priority to U.S. provisional application Serial No. 
5 60/466,798, filed April 29, 2003, which is herein incorporated by this reference in its 
entirety. 

FIELD OF THE INVENTION 

This invention relates to the determination of expression patterns, DNA methylation 
10 patterns and chromatin properties of families of transposable elements in order to detect, 
classify, characterize and treat cancer. 



BACKGROUND 

The human genome . comprises mmierous families of tratisposable elements, such as 

15 DNA elements, i.e. Charlie- and Tigger groups (see Smit (1999) Interspersed repeats and 
other mementos of transposable elements in manimaUan genomes. Current Opinion in 
Genetics & Development, 9: 657-663) and retroelements, i.e., LINEs (long interspersed 
nuclear elements), SINES (short interspersed nuclear elements) and HERVs (human 
endogenous retroviruses). To date, over 50 families of retroviral elements have been 

20 identified and flie mraibers of these families make up greater than 43% of the genome (See 
Li et al. (2001) Evolutionary analysis of the human genome. Nature, 409 (6822): 847-9). 
Some £uiiilies can include hundreds to thousands of retroelements and the expression of 
retroelements genes is normally suppressed. However, under certain conditions, such as 
cancer, retroelements may no longer be suppressed and expression of retroelement genes is 

25 activated, concomitant with changes in DNA mettiylation patterns and/or chromatin states. 

The present invention provides methods of determining patterns of transposable 
element expression, transposable element methylation and chromatin status of transposable 
elements within the genome such that these patterns can be used to diagnose cancer, identify 
a type of cancer, classify a cancer at a particular stage and measure progression of cancer. 

30 All of the methods of the present invention can be utilized to analyze full-length 

transposable element sequences or firagments thereof. These transposable elements include 
retroelements and fragments thereof as well as DNA elements and firagments thereof from 
manomalian species. Thus, the present invention provides metiiods of determining patterns 
of retroelement expression, retroelement methylation and chromatin status of retroelements 
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within the genome such that these patterns can be used to diagnose cancer, identify a type of 
cancer, classify a cancer at a particular stage and measure progression of cancer. Also 
provided are mefliods of determining DNA element ejqjression, DNA element methylation 
and chromatin state of DNA elmients wifliin ttie genome such that these pattems can be 
5 xised to diagnose cancer, identify a type of cancCT, classify a cancer at a partic^^ 

measure progression of cancer. 

SUMMARY OF THE INVENTION 

10 The present invention provides a method of detOTnining an expression pattern of 

one or more famiUes of transposable elements in a sample comprisiug determimng 

expression of one or more families of transposable elements. 

Also provided by the present invention is a method of assigning an expression 

pattem of transposable elements to a type of cancerous cell in a sample, comprising: a) 
15 determining expression of one or more famiUes of transposable elements; and b) assigning 

the expression pattem obtained ftom step a) to the type of cancerous cell in the sample. 
Further provided by the present invention is a method of diagnosing cancer 

comprising: a) detennining expression of one or more families of transposable elements in 

a sample to obtain an expression pattem; b) matching the expression pattem of step a) wifli 
20 a known expression pattem for a type of cancer; and c) diagnosing the type of cancer based 

on matching of the expression pattem of a) with a known expression pattem for a type of 

cancer. 

The present invention also provides a method of deteraiining the effectiveness of an 
anti-cancer therapeutic in a subject comprising: a) determining expression of one or more 

25 famiUes of transposable elements, in a sample obtained from the subject, to obtain a first 

e3q)ression pattem; b) administering an anti-cancer therapeutic to the subject; c) determining 
expression of one or more famiUes of transposable elements in a sample obtained from the 
subject after administration of an anti-cancer therapeutic to obtain a second expression 
pattem; and d) comparing the second expression pattem with the first e^qpression pattem 

30 such that if transposable elements are differentially expressed in the second expression 

pattem as compared to the first expression pattern, the anti-cancer therapeutic is an effective 
anti-cancer therapeutic. 
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Also provided by the present invention is a method of determining a methylation 
pattOTi of one or more families of transposable elements in a sample comprising 
determining methylation of one or more families of transposable elemrats. 

The present invention also provides a method of assigning a methylation pattern of 
transposable elements to a type of cancerous cell in a sample, comprising: a) detennming 
methylation of one or more families of transposable elements; and b) assigning the 
methylation pattern obtained from step a) to the type of cancerous cell in the sample. 

Also provided by the present invention is a method of diagnosing cancer 
comprising: a) determining methylation of one or more families of transposable elements in 
a sample to obtain a methylation pattem; b) comparing the methylation pattern of step a) 
with a known methylation pattem for a type of cancer; and c) diagnosing the type of cancer 
based on matching of the methylation pattem .of a) with a known methylation pattem for a 
type of cancer. 

The present invention also provides a method of determining the effectiveness of an ^ 
anti-cancer therapeutic in a subject comprising: a) determining methylation of one or more 
families of transposable elements, in a sample obtained from the subject, to obtain a first 
methylation pattem; b) administering an anti-cancer therapeutic to the subject; c) 
detemiining methylation of one or more families of transposable elements in a sample 
obtained from the subject after administration of an anti-cancer thergqpeutic to obtain a 
second methylation pattem; and d) comparing the second methylation pattem with the first 
methylation pattem such that if there is a change in the second methylation pattem as 
compared to the first methylation pattem, the anti-cancer therapeutic is an effective anti- 
cancer therapeutic. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 shows RT-PCR from normal and tumor ovarian samples comparing 

expression levels of HERV-K and HERV-W. (-) indicates a control without reverse 

transcriptase documenting absence of relevant DNA contamination. No Herv K or Herv W 

expression was detectable in this normal sample, HervW expression and even higher HervK 

expression was detected in this ovarian carcinoma sample. 

Figure 2 is a soutiiem blot analysis of genomic DNA after digest with Mspl or 

its methylation-sensitive isoschizomer HpaU (H), resp., hybridized with a HERV-W probe 

spanning the putative promoter region of the element. Equal amounts of DNA were loaded 

per sample, i.e. Mspl/Hpall ^dk. Fragment sizes range fix)m >0.1 kb to >3.0 kb. Samples 

3 
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represent ovarian carcinoma (T - malignant), ovarian adenoma (B - benign), borderline 
ovarian tumor (LMP) and non- tumor ovarian tissue (N). Fragments between 0.3kb and Ikb 
appear in most of the malignant sanq>les in the Hpall digests, but not in adenoma, 
■ borderline or non-tumor samples, indicating extensive cytosine melliylation of Ms 

5 particular HervW region in non-carcinoma ovarian tissue and loss of HervW methylation in 
ovarian carcinoma. See region defined by arrows. 

Figure 3 is a southern blot analysis of genomic DNA after digest with Mspl (M) or 
its methylation-ssQsitive isoschizomer Hpall (S), resp., hybridized with a ONEl probe 
spanning the putative promoter region of the elemait. Equal amounts of DNA were loaded 

10 per sample, i.e. per MspJ/HpoZ? pair. Fragment sizes range from 0.1 kb to >3.0 kb. Samples 
represent ovarian carcinoma (T - malignant), borderline ovarian tumor (B) and non- tumor 
ovarian tissue (N)- 

Figure 4 shows hypomethylation and expression of LI and HERV-W elements in 
ovarian cancer. Genomic DNA was digested eith« with Mspl (left) or HpcJI (right), and 
15 hybridized with probes specific for the promoter regions of LI (A) or HERV-W (B) 

elements. The restriction enzymes Mspl and Hpall recognize the sequence CCGG but Hpall 
only cuts when the recognition sequence is unmethylated at the inner cytosine (i.e., C£GG) 
while Mspl is indifferent to the methylation status of the inner cytosine. Brackets indicate 
bands from restriction cut sites mtemal to the elements (5 = benign cystic mass; LMP = 
20 low-malignancy potential or borderline tumor; N = normal ovary. (C) Real time RT-PCR 
was performed to determine expression levels of LINE-1 and HERV-W elements in 
representative malignant and non-malignant samples. Normalized values (retroelement 
expression value divided by expression value of the RPS27A control gaie. Shown is the 
average of 3 repUcate assays per sample ±SE. Ribosomal protein S27A (RPS27A) 
25 egression has been previously determined to be unchanged betvveen the mali^ant and 
non-malignant san:q)les examined in this study. 

Figure 5 is an example of an array that was utilized to assess retroelements patterns 
in cancer cells. Each dot represents a hybridization of the labeled RNA pool (from either a 
cancer or control sample -in this case a cancer sample), to the "spots" represraiting 
30 retroelement sequences. A bright color indicates that the element was ejqpressed in this 
sample. The intensity of the dot is correlated with the level of expression. In this array, 3 
replicate copies of the elements (spots) are aligned vertically. Different elements families 
are arranged side by dde. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention may be understood more readily by reference to the following 
5 detailed description of the preferred embodiments of the invention and the Examples 
included therein. 

Before methods are disclosed and described, it is to be understood that this invention 
is not limited to specific methods, as such may, of course, vary. It is also to be understood 
that the terminology used herein is for the purpose of describing particular embodiments 
10 only and is not intended to be limiting. 

It must be noted that, as used in the specification and the appended claims, the 
singular forms "a," "an," and "the" include plural referents unless the context clearly dictates 
otherwise. Thus, for example, reference to "a nucleic acid" includes multiple copies of the 
nucleic acid and can also include more than one particular species of nucleic acid molecule. 
15 Similarly, reference to "a cell" includes one or more cells, including populations of cells. 

Analysis of Expression Patterns 

The present invention provides a method of determining an expression pattern of 
20 one or more famiUes of transposable elements in a sample comprising determining 
expression of one or more families of transposable elements. 

As used herein a "sample" can be from any organism and can be, but is not limited 
to, peripheral blood, plasma, urine, saliva, gastric secretion, feces, bone marrow specimens, 
primary tumors, metastatic tissue, embedded tissue sections, frozen tissue sections, cell 
25 preparations, cytological preparations, exfoliate samples (e.g., sputum), fine needle 

aspirations, amnion cells, fresh tissue, dry tissue, and cultured cells or tissue. It is ftirther 
contemplated that the biological sample of this invention can also be whole cells or cell 
organelles (e.g., nuclei). The sample can be unfixed or fixed according to standard 
protocols widely available in the art and can also be embedded ua a suitable medium for 
30 preparation of the sample. For example, the sample can be embedded in paraflBn or other 
suitable medium (e.g., epoxy or acrylamide) to facilitate preparation of the biological 
specimen for the detection methods of this invention. 

The sample can be from a subject or a patient As utilized herein, the "subject" or 

**patient" of the methods described herein can be any animal. In a preferred embodiment, 

5 
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the animal of the present invention is a human. In addition, determination of expression 
patterns is also contemplated for non-hmnan animals which can include, but are not limited 
to, cats, dogs, birds, horses, cows, goats, she^, guinea pigs, hamsters, gerbils, mice and 
rabbits. 

5 The sample can comprise a cell or cells selected from the group consisting of: a 

carcinoma cell, a fibroma cell, a sarcoma cell, a teratoma cell, a blastoma cell, a breast 
tumor cell of epithelial origin, an ovarian tumor cell of q)ithelial, stromal or germ cell 
origin, mixed cell types from a tumor of any other cancer cell. The present invention also 
provides for the analysis of a sample comprising a normal cell or normal cells from a 
10 particular tissue. The patterns obtained from normal cells can be con^ared to the 

expression patterns for cancerous cells in order to access the differences between normal 
and cancerous cells. 

The term "cancer," when used herein refers to or describes the physiological 
condition, preferably in a mammalian subject, that is typically characterized by unregulated 
15 cell growth. Examples of cancer include but are not limited to ra5-induced cancers, 

colorectal cancer, carcinoma, lymphoma, sarcoma, blastoma and leukemia. More particular 
examples of such cancers include squamous cell carcinoma, lung cancer, pancreatic cancer, 
cervical cancer, bladder cancer, hepatoma, breast cancer, prostrate carcinoma, 
rhabdomyosarcoma, colon carcinoma, ovarian cancer and head and neck cancer. While the 
20 term "cancer" as used herein is not limited to any one specific form of the disease, it is 
believed that the methods of the invention will be particularly effective for cancers which 
are found to be accompanied by changes in transposable elemait expression, transposable 
element methylation and/or changes in chromatin status of transposable elements. 

There are numerous transposable element families that can be analyzed by the 
25 methods of the present invention, including, but not limited to, retroelement families and 

DNA element families. The retroelCTOient families that can be analyzed utilizing the methods 
of this invention include but are not limited to, endogenous retroviruses (ERVs), short 
interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), the 
vertebrate long terminal repeat (LTR)-containing elements, and the poly(A) 
30 retrotransposons. The DNA element families that can be analyzed by the methods of tiie 
present invention include, but are not limited to the Mariner/Tci superfamily (e.g. human 
Mariner, Tigger, Mama, Golem, Zombi), hAT (hobo/ActivatotyTam3) superfemily, TTAA 
superfamily (e.g. Looper), MTTEs (e.g. MER85), MuDR superfamily (e.g. Ricksha), T2- 

family (E.G. Kanga 2) and otiiers. Any combination of retroelement femilies and the 
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membCTs of these retroelement families can be analyzed by the methods of the present 
invention to determine a pattern of e3q)ression, a retroelement methylation pattem and/or a 
retroelement chromatm status pattem. For example, one of skill in the art could analyze the 
expression of ERVs as well as flie expression of SINEs or one of skill in the art could 
analyze the expression of SINEs, LINEs and ERVs. As stated above, any combination of 
families and members of transposable element families may be analyzed to provide an 
expression pattem, chromatm status pattem and/or a methylation pattem. Therefore, 
combinations of retroelem«it families and DNA element families can also be also analyzed 
by the methods of the present invention. A publicly available database, RepBase Update, 
contains consensus sequences of genomic repeats from different organisnds fliat can be 
utilized to design the oligonucleotides utilized in the methods of the present invention. This 
database can be accessed at www.girinst.org . This database was utilized to identify 
consensus sequences for numerous retroelements which were then used to d.esign 
oligonucleotide probes for the microarrays of the present invention. 

Files were obtained from RepBase Update containing human-specific repeats 
(consensus sequ^ces for transposon families). Selected RepBase files were then input into 
the OligoArray program, a publicly available software tool for microarray oligo-design at 
http://berrv.enfidn.umich.edu/oligoarrav, and the design algorithm was run. The BLAST 
algorithm at http://www.ncbi.nhn.nih.gov/BLAST/ (Altschul SF, Gish W, Miller W, Myers 
EW, Lipman DJ Basic local alignment search tool m J Mol Biol 1990 Oct 5;215(3):403- 
10)) was then utilized to verify compatibility of oligonucleotides in the OligoArray output 
file with transposon sequences m the himian genome sequence 
nittp://www.ncbi.nlm.nih.fi:ov/genome/guide/hxmian/) . Selection of appropriate 
oligonucleotides was based on several criteria such as, the quaUty of match/ specificity, 
technical parameters and the broad representation of transposable element families. 
Utilizing this approach, nimierous oligonucleotides were designed based on these consensus 
sequences. The identifiers of retroelement consensus sequences and their corresponding 
oligonucleotide sequences which can utilized in the methods described herein, are Usted in 
Table 1. Similar analyses can be performed to obtain consensus sequences for non- 
retroelement transposable element sequences. 



Table 1 



FLA 


GAGTTCGAGACCAGCCTGGGCAACATAGCGAGACCCCGTCTCTAAAAAAA 


SEQ ID NO: 1 


FLAM A 


GGAGTTCGAGACCAGCGTGGGCAACATAGCGAGACCCCGTCTCTAAAAAA 


SEQ ID NO: 2 


FLAM C 


GGAGTTCGAGACCAGCGTGGGCAACATAGCGAGACCCCGTGTCTAAAAAA 


SEQ ID NO: 3 


AtuJo 


GAGGCAGGAGGATCGCnrGAGCCCAGGAGTTCGAGGCTGCAGTGAGCTAT 


SEQ ID NO: 4 
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AluJb 


GGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACCCCGTCTCTACAAAA 


SEQ ID NO: 5 


AIuSc 


TCACGAGGTCAAGAGATCGAGACCATCCTGGCCAACATGGTGAAACCCCG 


SEQ ID NO: 6 


AluSq 


CCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGC 


SEQ ID NO: 7 


AIuSd 


CCAGCX5TGACCAACATGGAGAAACCCX;GTCTCTACTAAAAATACAAAAAT 


SEQ ID NO: 8 


AluSg 


CAAGATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGCG 


SEQ ID NO: 9 


AluSx 


ccaacatggtgaaaccccgtctctagtaaaaatacaaaaattagccgggc 


SEQ ID NO: 10 


AIuSz 


CCAACATGGTGAAAGCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGC 


SEQ ID NO: 11 


AluY 


gagatcgagaccatcctggctaacacggtgaaaccccgtctctactaaaa 


SEQ ID NO: 12 


AluYaS 


CGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCCGGGTAAAACGGTG 


SEQ ID NO: 13 


AluYaB 


gaaaccgcgtctctactaaaactacaaaaaatagccgggcgtagtggcgg 


SEQ ID NO: 14 


AluYbS 


AGACCATCCTGGCTAACAAGGTGAAACGCCGTCTCTACTAAAAATACAAA 


SEQ ID NO: 15 


AluYb9 


agagcatcctggctaacaaggtgaaaccccgtctctactaaaaatacaaa 


SEQ ID NO: 16 


AluYd 


GAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTGTCTACTAAAA 


SEQ ID NO: 17 


AIuYc2 


gagatcgagaccatcgtggctaacaaggtgaaaccccgtctctactaaaa 


SEQ ID NO: 18 


AIuYd3a1 


CGCCTGTAGTCCCAGCTACTCGGAGAGGCTGAGGCAGGAGAATGGCGTGA 


SEQ ID NO: 19 


AluYe 


accatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaa 


SEQ ID NO: 20 


LTR26B 


ATGGATrTGAGGTTTCCTCCCATCTCCTCATTCGGCGGCCCTACGATTAA 


SEQ ID NO: 21 


LTR26C 


ACGGATTTGAGGTTTCCTCCCATCTCCTCATTCGGCAGCCCT ACGATTAA 


SEQ ID NO: 22 


LTR26D 


GGCGTATTGACTTGCTGTGTGCATCGGGCAATGAACCTATTACGGTTACA 


SEQ ID NO: 23 


AluYal 


GAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAA 


SEQ ID NO: 24 


AluYa4 


CGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCCGGCTAAAACGGTG 


SEQ ID NO: 25 


AluYb3a1 


GAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAA 


SEQ ID NO: 26 


AluYb3a2 


GAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAA 


SEQ ID NO: 27 


AluYeS 


ACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAA 


SEQ ID NO: 28 


AIuYfl 


6AGATCGAGACCATCCTGGCTAACACGGTGAAACCGCGTCTCTACTAAAA 


SEQ ID NO: 29 


AluYae 


GAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAA 


SEQ ID NO: 30 


AluYhQ 


GAGATCGAGACCATCCTGGCTAACGCGGTGAAACCCCGCCTCTACTAAAA 


SEQ ID NO: 31 


AluYiS 


AGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAA 


SEQ ID NO: 32 


AIuYbc3a 


AGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAA 


SEQ ID NO: 33 


AluYe2 


GACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAA 


SEQ ID NO: 34 


AluYf2 


GATCGAGAGCATCCTGGCTAACACAGTGAAACCCCGTCTCTACTAAAAAA 


SEQ ID NO: 35 


ALU 


GAGGCAGGAGGATCGCTTGAGCCGAGGAGTTCGAGGCTGCAGTGAGCTAT 


SEQ ID NO: 36 


MIR 


GGCTGTGCCACTTACTAGCTGTGTGACCTTGGGCAAGTTACTTAACCTGT 


SEQ ID NO: 37 


L1PA2 


ATCACATGGACACAGGAAGGGGAATATCACACTCTGGGGACTGTGGTGGG 


SEQ ID NO: 38 


L1PA7 


CCTGTCGGGGGGTGGGGGGCTAGGGGA6GGATAGCATTAGGAGAAATACC 


SEQ ID NO: 39 


L1PA11 


TGGGCTTAATACGTAGGTGATGGGATGATCTGTGCAGCAAACCACCATGG 


SEQ ID NO: 40 


L1PA15 


TCGGGTACTATGGTTATTACCTGGGTGACGAAATAATCTGTACACCAAAC 


SEQ ID NO: 41 


L1PB1 


ATCTCAGAAATCACGACTAAAGAACTTATTCATGTAACGAAACACCAGCT 


SEQ ID NO: 42 


L1PB3 


AAGTGGGAGCTAAGGTATGGGTACGCAAAGGCATACAGAGTGGTATAATG 


SEQ ID NO: 43 


L1MA2 


GGGAAGGGTAGTGGGGGGTTGGTGGGGAGGTGGGGATGGTTAATGGGTAC 


SEQ ID NO: 44 


L1MA5 


ATAG6GAGAGGTTGGTTAATGGATACAAAATTACAGCTAGATAGGAGGAA 


SEQ ID NO: 45 


L1MA9 


AGATCTTAAGTGTTCTCACCACACACAAAAAAATGGTAACTATGTGAGGT 


SEQ ID NO: 46 


THE1B 


CTGCACAWGCTCTCTTGCGTGCCGCCATGTAAGACGTGMC! i rGCTCCTC 


SEQ ID NO: 47 


MSTA 


TCCCCTTGGTGCTGTCCTCGTGATAGTGAGTGAGTTCTCGTGAGATCTGG 


SEQ ID NO: 48 


MSTC 


GATTAATGGATTAATGGGTTATCATGGGAGTGGGACTGGTGGCTTTATAA 


SEQ ID NO: 49 


MLT1A 


TGAGGACACAGTGAGAAGGCvaOOo 1 0 1 AUoAAULrAV3oo/\A i ijAoov^v^ i 


SFO ID NO* 50 


MLT1B 


GGAGAAGACGGCCATCTACAAGCCAAGGAGAGAGGCGTCAGAAGAAACCA 


SEQ ID NO: 51 


MLT1C 


CCAGCAAACGACCAGAAGCTAGGGGAGAGGCATGGAACAGATTCTCCCTC 


SEQ ID NO: 52 


MLT1D 


GGTCAGAGTCAGAGAAGGA6ATGTGACGACGGAAGCAGAGGTCGGAGTGA 


SEQ ID NO: 53 


MLT1E 


GATTCCGTCTTGNCGNCANTCTTGCTGAGAGNCTCTC1TGCTGGCTTTGA 


SEQ ID NO: 54 


MLT1F 


TGTAGTCCCCTCCCACATTGAATAGGGCTGACCTGTGTGACCAATAGAAT 


SEQ ID NO: 55 


THE1BR 


"CAAGAGGTGAGTTGGGTGCTGTTAAAGGCATTGAGI I \ lAAAAGGGAAGC 


SEQ ID NO: 56 

SEQ ID NO: 57 


MSTAR 
MLT1R 


TCTTTTTGATTTTACAGGCTCATAGGTGGAAGGAACTTGCCTTGTCTCAG 
AGCCTGATCATGTAACAGAAANNNCAATAGCGTTCTCTGGAAAGAANACC 


SEQ ID NO: 58 


MLT2A1 


GGGTGTTGCGAAAGGAGGTTAACATTGGACTCAGTGGGCTGGGGAGAGGC 


SEQ ID NO: 59 


MLT2B2 


TTCCAGATGAGATTAGCATTTGAATCAGCGGACTGAGTAAAGAAGATTGC 


SEQ ID NO: 60 


MLT2C2 


CTCAAGACTGCAACGTGGAAATCCIGCIGNI 1 1 WCCAGCCTCCAAGCCTT 


SEQ ID NO: 61 


MLT2D 


GGCTAGGCTATGGTGTCCAGACGTTTGGTCAAACATTAGTCTGGGIGl 1 1 


SEQ ID NO: 62 


LTR2 


CAATGCTCCCAGCTGATTAAAGCCTCTTCCTTCATAGAACGGGTQTCTAA 


SEQ ID NO: 63 


LTR3 


GCAAGGAGCCCCCTGACCCCTTCTTCCAAACAIACICI IIIGICIIIGIC 


SEQ ID NO: 64 


LTR4 


ATCCTCCTGTCCCACCCATTGGTCTCTCCTGTCCCTTGATTCGT6CAAGA 


SEQ ID NO: 65 


LTR5 


ACTCAGAGGCTGGTGGGATCCTCCATATGCTGAACGTTGGTTCCCCGGGC 


SEQ ID NO: 66 


LTR11 


AACTCCGTCACTGTAATCCCAATGTAAAGCAAGAATTCCAAACCAGGAAA 


SEQ ID NO: 67 


LTR12 


GCTTCATTCTTGAAGTCAGCGAGACCAAGAACCCACCGGAAGGAACCAAT , 


SEQ ID NO: 68 


LTR13 


CTTGTGTCTTTATTTCTACACTCTCTCGTCTCCGCACACGGGGAGAAAAA 


SEQ ID NO: 69 


MER1A 


AAGCTTCATCTGTAKTTACAGCCGCTCCCCATCACTCGCATTACCGCCTG 


SEQ ID NO: 70 


MER1B 


TGATCTGAGGTGGAACAGTTTCATCCCGAAACCATCCCCGCCCCCCGGTC 


SEQ ID NO: 71 


MER2 


AAAATCCACGGATGCTCAAGTCCCTGATATAAAATGGCGTAGTATTTGCA 


SEQ ID NO: 72 


MER3 


ATGTGGCTAYTGAGCACTTGAAATGTGGYTAGTGCGACTGAGGAACTGAA 


SEQ ID NO: 73 


MER4A 


GGACCTCAAGATCTTTACCCTAAAACAGTTCTGYTGAMYTTCACCTTGGC 


SEQ ID NO: 74 


MER4B 


TTGGTCTCGGGAACCCCTTATNrCATAACCCGGACAl I CCI 1 ICGATTGA 


SEQ ID NO: 75 
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MER4C 


GCTGCCTCTTTCCCCTCCAGCCCGC 1 I l l CCCC l l l AAA I A I I GAAGCCC 


OCU ID ri\J» /O 


MER5A 


GTCCCCGGACCAGCAGCATCAGCATCACCTGGGAACTTGTTAGAAATGCA 


SEQ ID NO: 77 
SEQ ID NO: 78 


MER5B 
MER6 


TCAGTATTTTTTAAARCTCYYCAGGTGATTCCAATGTGGAGCCAAGGTTG 
AAGTCGCAGTTTCCAAGAACCTATCGACGACGTTAAGTGAGGACTTACTG 


SEQ ID NO: 79 


MER8 


AAAAATCCGCGTATAAGTGGACCCACGCAGTTCAAACCGGTGTTGTTCAA 


SEQ ID NO: 80 


MER9 


GCTGTGAGACCCCTGAl 1 1 CGCACTTCACAGCTCTATA 1 1 ICTGTGTGTG 


SEQ ID NO: 81 


MER11A 


TGATTTTGCCCTTGTCCTGTTTCCTCAGAAGCATGTC5ATUI uGTTCTCC 


SEQ ID NO: 82 
SEQ ID NO: 83 


MER11B 
MER20 


ACTTGCTGGTTTTTGCGGGTTGTGGGGCATCACGGAACCTACCGACATGT 
CCCCACAACAAAGAATTATCCGGCCCAAAATGTCGATAGTGCCAAGGTTG 


SEQ ID NO: 84 


MER21 


SAGCAGAGGRAAAACATGGTTTGAGAGAGGTTTTYCTGMAAYAGRAGGGC 


SEQ ID NO: 85 


MER21B 


CGGTCAGAAGCACAGGTNACAACCTGGNGCTTGCGACTGGCATCTGAAGT 


SEQ ID NO: 86 


MER22 


TGAGTCTCCCCAAAAGTGGAGCCCTTGTGATGACGAGCACAGGTCCGCCT 


SEQ ID NO: 87 


MER28 


AAGACGANGAGGATGAAGACCTTTATGATGATCCACTTCCACTTAATGAA 


SEQ ID NO: 88 


MER30 


TTTTAAGAAAGTrTACGAATTTGTGTTGGGCCGCATTCAAAGCCATCCTG 


SEQ ID NO: 89 


MER35 


GATGAAAAGG6GATCCTGTGCAGAAACCACACTACCCATGAGAGAAGCAA 


SEQ ID NO: 90 


MER39 


GGCAGGTCATAGAAACTAGAAGTCCTCTCCCCCAAAGCAAGCCATAAAAC 


SEQ ID NO: 91 


MER44A 


AGGGTTCGGTACTATCCGCGGTTTCAGGCATCCACTGGGGGTCTTGGAAC 


SEQ ID NO: 92 


MER44C 


CGCACCTCAAACTGCAAAAGTTACGGCCACAGTGCGTGATAAGTGCTTAG 


SEQ ID NO: 93 


MER45 


GAAATTCTTAATAATTTTTGAACAAGGGGCCCCGCAI IIICAI i i HjOaC 


SEQ ID NO: 94 


MER48 


TGTTGTTGTGGACGCGCTCTCGGGGTTSGAACCGAYACAAGARCCTTACA 


SEQ ID NO: 95 


LOR1 


TCTTCCTTGGCAATAMTYRTTGTCTCAGTGATTGGCTTTCTGTGCAGTGA 


SEQ ID NO: 96 


SVA 


GGGGAAAGGTGGGGAAAAGATTGAGAAATCGGATGGTTGCCGTGTCTGTG 


SEQ ID NO: 97 


ALR 


GTGGAGATTTCAGCCGCTTTGAGGTCAATGGTAGAATAGGAAATATCTTC. 


SEQ ID NO: 98 


MSR1 


■■ggagtcaagacccccgagcccctcctccctcagactcatgagtccagacc 


SEQ ID NO: 99 


TAR1 


ACTCATGGAGGGTTAGGGTTCAGGTTCGGGTTCGGGTTCGGGTTGGGGTT 


SEQ ID NO: 100 


CER 


GGTTCTGAGTGTTTGTCCCTCACATAGGATTCCAGAACACTGCTGCTGGG 


SEQ ID NO: 101 


BSR 


TCACAATGCCCCTGTAGGCAGAGCCTAGAGAAGAGTTACATCACCTGGGT 


SEQ ID NO: 102 


HSATII 


GGGTCCATTCGATGATGATCACACTGGATTTCATTCCATAATTCTATTCG 


SEQ ID NO: 103 


HSAn 


CCACTGTCTGTGGTGTGTCTTTGAAAGGTCAGAAGAGATTGNACCi IIGf 


SEQ ID NO: 104 


R65 


TGCRTTTACAAACCTTTAGCTAGAGACAGAGCGCTGATrGGTlaUi^i i i i i 


SEQ ID NO: 105 


SN5 


CCTGACTGCTGAGTCAGGTTACTGTCCCACTATACGTTAAGAGGAGGGAA 


SEQ ID NO: 106 


HIR 


AATATCAGGAACACCGGCATGTGCACTTAGGAGCATGI 1 1 lAAl ! 1 1 IGA 


SEQ ID NO: 107 


GGAAT 


GGAATGGAAT6GAATGGAATGGAATGGAATGGAATGGAATGGAATGGAAT 


SEQ ID NO: 108 


KER 


G6ATGAGGCAGGAAAGACAGCTGAGGGTCAGAACCCAGGCAGGTCCAATG 


SEQ ID NO: 109 


T1GGER1 


ACTCGGTGAAGGCTCA6ATGATCGTTAGCAI 1 1 ITTAGCAATAAAGTATT 


SEQ ID NO: 110 


TIGGER2 


TAAAGTTACACCGAGTGTGCCTGCCTCTCCTGCGTCCGCTTCGACCTCCT 


SEQ ID NO: 111 




GGGAGTCAGGAGGATGTTGAGGGAGAGAGAGGGGTGAAGCGTrGAGACGA 


SEQ ID NO: 112 


GSATX 


CAGGCGGGGAGNCTTTCAGGGGGAGGATGAAGTAGGCCTGGGACAAAAGG 


SEQ ID NO: 113 


HERVL 


AGGAGTCTAGTTGTAATAGTATGGAGAACACTGATAGTCCTTGGCATGAA 


SEQ ID NO: 114 


HERVK 


GCGTGTCAGTTGGGTTAAGACCATTGGAAGTACATCGATTATAAATCTCA 


SEQ ID NO: 115 


HERVR 


AAGCGAACAGTATCAGGTGCTCAGAACCGATGAAGAAGCTGAAGATTGAG 


SEQ ID NO: 116 


HRES1 


TGGTTAATGTGTAACAAGGAGGCAGTAGGCGCCAGGTGTGGAGCCAGAGG 


SEQ ID NO: 117 


HERVE 


AAAAGTGAGGAGGAGAGTAAGAACTCCCACTAAAAGTGAAAATTGTCAAA 


SEQ ID NO: 118 


HERVH 


GATAGGACGGGCGAAAAAI 1 1 1 CAC rGCCCCAACACTTCAAGACTA i i i i 


SEQ ID NO: 119 


HERVI 


TTGTAGGATGGTGTGTGATACGCTGTGCCCTAGGATTAATACAAAAGCTC 


SEQ ID NO: 120 


LTR14 


GGGTGGAGTCTTTATGAAGTCTTAACCTGTCTCTTCTCATTCCTTTGTGA 


SEQ ID NO: 121 


HERVKC4 


GGGGATCATTCAGAGAGrTTGAATTCAATTAACAG ITTAAGCCCCCAAAAA 


SEQ ID NO: 122 


MER4I 


AGAGATGAGACGAAAGGTGAGACCAGAGAG I GA ITTTCTTCTAAAATGCT 


SEQ ID NO: 123 


MER49 


AGATGCATGTTTGTTGAATACGCATGCGTGAGGAGCACGTTCATGAATAT 


SEQ ID NO: 124 


MER4D 


CAAGGGGGCTTATCTTAAGTCAAGCTGACTTCAAGTCTTGAGGGAGAGCT 


SEQ ID NO: 125 


MER3dB 


GGGGTCCTGTGTGTCAGTGGGATTGTCGCGCGAGGCTAGGCATAGAAACT 


SEQ ID NO: 126 


iN25 


TGTTGGAGAAGGGATGGTTGTTCCCCNCTGGGNCTGGTANNGGAGTGCAG 


SEQ ID NO: 127 


MER61 


AAGCCTAAWTTTTCGTGGCCGTGTGACAAGGACGCCGTCTTTAGGTGAAC 


SEQ ID NO: 128 


HERV3 


CAACCGTTGCCAAATGAAGAGAAGTGCCTTCNGATGAAGAATTAANTAGT 


SEQ ID NO: 129 


HERV9 


GCAGAGAGCCATAGAAGTAATACCCCTACTTATAGGGTTAGGAATGGCTA 


SEQ ID NO: 130 


HERVS71 


AAACTGGACTAATGTGGTTGTGCGAACAGGTAGA IGG 1 GA 1 FTAAATAAG 


SEQ ID NO: 131 


HSMAR1 


CACTTCTTCAAGGATCTGGACAAC mTTGGAGGGAAAAGGCTTCGAGAA 


SEQ ID NO: 132 


HSMAR2 


■ TGGTATCATGGCTTAGAAAAGTGTCTTGAACTTGATGGAGGTTATGTTGA 


SEQ ID NO: 133 


L1 


AAACAAGGCCATCAAAAAGTGGGCAAAGGATATGAACAGACACTTGTCAA 


SEQ ID NO: 134 


L1MA10 


GTGATGGTTTGAGGGGTGTATGCATATGTGCAAAGTGATCAAATTGTATA 


SEQ ID NO: 135 


1 IVIDO 


TGAGTTTGGGAAGATGAAAAAGTTCTGGAGATGGATGGTGGTGATGGTTG 


SEQ ID NO: 136 


L1MB7 


AGATAGTGGTGATGGTTGCACAACTCTGTGAATATAGTAAAAACCAGTGA 


SEQ ID NO: 137 


1 iMn2 


ATGTTAATAATAGGGGAAACTGTGTGNGGGNGGGGTGAGGGGGTATATGG 


SEQ ID NO: 138 


L1MC3 


CTGTTGGAGTGGGAGGTTACAGATAAGCAAGGGGAGGAGGGTAGAATGAT 


SEQ ID NO: 139 


L1MC4 


TATTTAGGGGTAANGGGGGATCATGTCTGGAAGTTAGTGTGAAATGGTTC 


SEQ ID NO: 140 


L1MD1 


GGAGGAGGGAAGTGGGTGTGGCTATAAAAGGGCAACATGAGGGATGCTTG 


SEQ ID NO: 141 


L1MD2 


GNGNGGGGGAAGGGAGGTGGGTGTGGCTATAAAAGGGCAGCACGAGGGA 

T 


SEQ ID NO: 142 


L1ME2 


AGTGGTTGCCTCTGGGGAGGGTGANTGAGTGGAAAGGGGCATGAGGGAAC 


SEQ ID NO: 143 


L1ME3A 


GGCAAAACTAATCTATGSTGTTAGAAGTCAGGATAGTGGTTACCGTTGGG 


SEQ ID NO: 144 


LSAU 


GGTGTTGGGAGAGCCTCAGCCGGAAl 1 ICGrGGACGGACAAGGGCACAGA 


SEQ ID NO: 145 
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LTR1 


CTAGAGGTTTGAGCAGCGGGGCACTGAAGAAGCGAGCCACACCCCCATCG 


SEQ ID NO: 146 


LTR15 


ATCCTCGTCAACCCCATCGGTCTGTCTGATTCCTAAATCATCCCCAAACA 


SEQ ID NO: 147 


LTR8 


TTTCTCTATTGCAATrCCCCTGTCTTGATGAATCGGCTCTGTCTAGGCAG 


SEQ ID NO: 148 


LTR9 


TAAACTCGTCGTGTGTGTCCGTGTCCTAAATTTTCCTGGCGCGNGACGAC 


SEQ ID NO: 149 


MER31 


CCTGTACCTATCGCAATGGTCCTGAATAAAGTCTGCCTTACCGTGCI 1 lA 


SEQ ID NO: 150 


MER34 


GCCCAAACCCCTTTGTCTTGTCACGTTTTCACAATTTACI'ACICI 1 IGTC 


SEQ ID NO: 151 


MER41A 


GCAACGTCAGGAAGTTACCCTATATGGTCTAAAAAGGGGAGGCATGAATA 


SEQ ID NO: 152 


MER41B 


TGCCATGGCAACGTCAGGAAGTTACGCTATATGGTCTAAAAAGGGGAGGA 


SEQ ID NO: 153 


MER41C 


TAGCAGAGGACATCTCCCCCGTAATGTTCTTTGGCTT IG I lArCCTATAT 


SEQ ID NO: 154 


MER50 


TGGCCCTCTTCCAAGTGTACTTCGGTTCCTTTCGTTCCTGCTCTAAAACT 


SEQ ID NO: 155 


MER63A 


TTCAAGCTACCAACGTGATGTCACTGAATGSGGAGTTGGGAAAAGATATA 


SEQ ID NO: 156 


MER63B 


ATGTCACTGAATGSGGAGTTGGGAAGAGATGCACAGTAGCACACYATTAT 


SEQ ID NO: 157 


MER63C 


ACAATGTAACGGCTACAGACACGACACACTTTTAAGTTTAATCTGCATTA 


SEQ ID NO: 158 


MER65A 


GAATATGCACATAGTTTACTATGGCACGCGTATrCCCATTGCAATGCTCT 


SEQ ID NO: 159 


MER65B 


ACATTTGCCTGACAACTGTCTCACRAACCTAGCTACTGCAAGAGCCTACT 


SEQ ID NO: 160 


MER66A 


AGACTAGCTGAAACAGGGCCAGGGCAAAAGCACCTCTCCATAAGACACAC 


SEQ ID NO: 161 


MER66B 


CTTGAACACCAGACCAAATTGAAGACTAGCTGAAACAGGGCCAGGGCAAA 


SEQ ID NO: 162 


MER67A 


GCCTCAACCTCGGCCTATAAAGACTTGAACAAACACTAACATAGTTTCTA 


SEQ ID NO: 163 


MER67B 


GACAGAACAACTCCATCCAAACCCCTGCACTAAGAGACTTGACCAAACTC 


SEQ ID NO: 164 


MER67C 


TCTTGA6AACATGTATGTAATGGGCTGTATCTGCTCGGCTATATAAAAGG 


SEQ ID NO: 165 


MER68A 


AACCOTGGGCACTGAGTCTCTAATGAGCTTCCCTGGTAGACAACATrTCA 


SEQ ID NO: 166 


MER68B 


TTCCCTTT6CTGATGTTGCCGTGTATCCTTACNRTGTCGCTGTAATAAAT 


SEQ ID NO: 167 


MER69A 


CCCCCAAATTGTATAAGCTTCAGGCCCCACAAAACCTGGATCTGCCCCTG 


SEQ ID NO: 168 


MER69B 


TTACAAAATCATTGTCATATGAAGAGGCGATCAAAGAGTATGCAGCCAAA 


SEQ ID NO: 169 


MER70A 


TGTTCTGTCTCACCGGACTCAGACAAGTTGGTAACCAGTGCACAGTGAAC 


SEQ ID NO: 170 


MER70B 


TCNGACCCCTATTCCTGGTGGTTGGCATAGTGATGATCI 1 IGCIATTCTC 


SEQ ID NO: 171 


MER72 


GGCATGAAGCTCAATTGCACATGTGGATGTTTCTCCTTTCATAAATATTC 


SEQ ID NO: 172 


MER73 


GGTGACGGGGTACGACTGGGTrTCAAACAACTTATGTCAGGCCTAAAAAT 


SEQ ID NO: 173 


MER74 


GGGGGTATGGGCTCTGGATTGGTTGGTTTGCATATGAAAGGCGCGCTCCC 


SEQ ID NO: 174 


l\AER75 


TGGCCGAAGATTCATTTGATGAATCCGArmTCCGAAATAGACGATTCT 


SEQ ID NO: 175 


MER76 


TGTTGCCTTAATCGGCTNCTCTGACACCCGGCAGCTCAGCTCTCTCTCCA 


SEQ ID NO: 176 


MER77 


GGTGAGGTTCCCTGGTTGGCAATACTCTNTGCATGTTGTCACACATCGTT 


SEQ ID NO: 177 


MER80 


CCATAGGCTTCACCAGACTGGGAAAGGGGCCCATGGCACAAAAAAGGTTA 


SEQ ID NO: 178 


MER82 


tvrrGCAAATGACCGNGAAAGTGCTNCAAGTATTGATnTGGGGTTACAAAT 


SEQ ID NO: 179 


MLT1G 


CACAAATTCTTTGACACTCTTCCCATCGAGGAGTGGGGTCCGTlsrrCCTCT 


SEQ ID NO: 180 


PABL A 


AATAAAAACTCTCTTCCTCCCCAGTTCATCTGCATCTCGTTATTGGGCCA 


SEQ ID NO: 181 


PABL B 


CCAGTTCATCTGCATCTCGTTATTGGGGGACGAGAATAAGCAGCCCGACC 


SEQ ID NO: 182 


MER57I 


GCAGTTATGGGGGATACTCGGCTCTTTGCACArTTGGATNAGAGAAGCAT 


SEQ ID NO: 183 


MER65I 


CCTGGATAAATTCGCCTGGGGAACTTGAGGCCCCATATAGACGAAATTAC 


SEQ ID NO: 184 


MER41I 


TTTGTTGGGAACTCAGTTACAAATAACCCTCACCATACCAGTACTTTCTG 


SEQ ID NO: 185 


PTR5 


CATGGTTAAGGAGCCCTTCAGGCTGCCACTGGACTGTGGGAACAGTGGCC 


SEQ ID NO: 186 


L1M2_5 


CGCCTCCTCCACAAAGAAGAACCAAAATAGCGAGTAGATAATCAGAC i i I 


SEQ ID NO: 187 


LTR10A 


TGCTCCATCTGCGAGACGCACCCTTGTATAGAAGTAAAATTGCCTTGCTG 


SEQ ID NO: 188 


LTR10B 


GCTGAGAGACCCTrTTGTCCTTTGGCTCAGTGTTGGTTCTTCTTTGCAGCA 


SEQ ID NO: 189 


LTR10C 


CAGTGTACTCTCATGGCAAAACT6CTGGTGAGTGIACCCI 1 lUPGCAGAA 


SEQ ID NO: 190 


LTR16A 


CTGCATTGCAGCCCAACTTCTCCCTCTGCCCAATCCTGGTTCCTTCCCTT 


SEQ ID NO: 191 


LTR17 


CGAAGAACCCCAGGTCAGAGAACACGAGGCTTGCCACCATCTTGGAAGTG 


SEQ ID NO: 192 


MER41 D 


GCACGTAGGCACAGCTTAGTTTAGTCTTTACATAGACAAGACTCCTATAT 


SEQ ID NO: 193 


MER51A 


TCCGCAACCAATGAGACGTTTGCATAGGAGTGTAACTTTGTAACTTCACT 


SEQ ID NO: 194 


MER51 B 


CTTTACTTCGTCCTGTTCATTTACATAGGGCGTACCCCAAGTAACCAATG 


SEQ ID NO: 195 


MER57A 


ATCTTCTACCACATGGCTGCACTGGAGTCTCTGAACCTACTCTGGTTCTG 


SEQ ID NO: 196 


MER67B 


TATAAATTTGTTCGGACGACGAGGCATCCCTG6AGTCTCTCTGAATCTGC 


SEQ ID NO: 197 


MER65C 


CAACCCTGGCTGCTGAAACTGGCTGTTGTAACCTGAAACCAGi 1 1 1 Aid 


SEQ ID NO: 198 


MER83 


TCTGCAGCCCAAGAACCATGGTATAAAATCTCCAGCAAGCCm G 1 CTCC 


SEQ ID NO: 199 


MER84 


CATAAATGCTGCTAAGGAAAAATCCACCGCGGCGCGCTCAGTCCTCTCTT 


SEQ ID NO: 200 


HERV16 


TTGACTATGATGTGTAGGAGGGGTAGGGCTGCTTTAGTAAAATGAGTAAG 


SEQ ID NO: 201 


HERV17 


GAAGGCACCCCTCGCGAGGAAATCTCAACTGCACGACCCCTACTACGCCC 


SEQ ID NO: 202 


PMER1 


GTTCTCAACCTTCGTAATGCCGCGGCCCTTTAATACAGTTCCTGTG6GTC 


SEQ ID NO: 203 


MER54 


TGAAAGATACACTGTAAACACCCAGAACCAMGTTCCCTGGAGCCGCATCA 


SEQ ID NO: 204 


LTR18A 


TGTACATACGGCTTGCGGCCAGGCTCACTGGCGCCCAGAGAGAGAGTAAA 


SEQ ID NO: 205 


LTR18B 


ATGAGAGAGCTGGTGAATAAAACCATATTTCACGTGCCTACGGCCCCCCG 


SEQ ID NO: 206 


LTR19A 


AGAGAGTGCTGCTGACTGAAATCGGCCAGAAGGGGGTCTCAGGTTTATTC 


SEQ ID NO: 207 


LTR19B 


GACTGKVVGAGCCGCTTTTGGTGTTTGTTrCGTGTTTCri lAATTCTTACA 


SEQ ID NO: 208 


LTR20 


AATAAATTCTGCTCYACCTCACCCTTCAATGTGTCTGCATGCCTAATTGT 


SEQ ID NO: 209 


LTR16C 


GTAACTNGCTTGATAACGCAGGCI TTATTGGGTTCCTTCCCTTCCCTGTC 


SEQ ID NO: 210 


LTR21A 


CTGCTTYCCTTGACTGTKAWGGGGGCAGCCGRCAGGTTAATAAARGCTTG 


SEQ ID NO: 211 


LTR21B 


CAATAAAGCTTGCTTGCCTGACTTTGGGTCTCYTCATCCTTTCTCTCGGC 


SEQ ID NO: 212 


MER85 


TTGAGCAGTAGGATATAAATAACTCGGACATGCTTAGCGTTCCAATAATG 


SEQ ID NO: 213 


LTR22 


GTGCYAGCTG^4TTAGGGGCAGCWGCWGTKACAAACCTYYCTTGGWGTSTG 


SEQ ID NO: 214 


LTR23 


CCTTTAAAAACCACTTGTAACTGCTGCTAATTGGAGTGTATATTCAGGGC 


SEQ ID NO: 215 


LTR24 


AAACCTTAACTTCTCCACTTTGGAACGCTGACCCGAI ICCI 1 IGGA6TGT 


SEQ ID NO: 216 
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HERV23 


GTCCTGTCCCCCCAACGATGTGAGATAGAGCCATCTGGGAATGAGCTTTA 


SEQ ID NO: 217 


HERV18 


AGCGGGAATATTAGTGGTGAGTTGTTGCTCCCTGTATTGTTGCTGTGGCC 


SEQ ID NO: 218 


MER87 


ACTTACTGGCTGTCGWGCGGTGAGCAGTACCAGCTTTGGATTCAGTTACA 


SEQ ID NO: 219 


MER74A 


AATGGCAGTCGTCTCCTGATCTGTTGGCCTTACCATACCTGAATAATAAT 


SEQ ID NO: 220 


MER74B 


CTTTTCAATGGCAGTCGTCTCCTGATCTGTTGGCCTTACCATACCTSAAT 


SEQ ID NO: 221 


MER88 


AGGGGAACTTGTGGCAGGGACCAGCCTTATCACACTGGTGGACCTGGTCA 


SEQ ID NO: 222 


MER54B 


GAGCCCAGTCTGCTAGGCGGGAGAGATGCCTGTAAGTTCTTATGTGTGGC 


SEQ ID NO: 223 


MER31A 


GGCTCCTGAACCTrGTCCTAGGCCCATCTGTGCACrrCCTTGTAAAATCC 


SEQ ID NO: 224 


MER31B 


GCCCTGTCCTTGGCCTGCVVTAGCCCAGTTTTAGCAAGAATCCTGCTAAGT 


SEQ ID NO: 225 


MER67D 


ATCCACCTGCCTTTTGTrTCAGNGGAGTTGAGTTCAANCTCTAACCGCTA 


SEQ ID NO: 226 


MER31i 


GATGATTCAGGTGGTCCTTAATGAACAAAAGGCMACCCAACAAGAAAATG 


SEQ ID NO: 227 


CHARLIE1 


TTCCAGATTGCAACTAACCTTTAAGAAAGTACCAGTTGTGGAGI U I GGT 


SEQ ID NO: 228 


CHARUEIA 


CACCGCAACTAACCTTTAAGAAACTACCACTTGI IGAGI 1 ITGGTGTAGT 


SEQ ID NO: 229 


CHARLIE1B 


CAGTGGAGTTTTCCAGAGGCTACATGACGTGTGATGTCGCAACAGATTGA 


SEQ ID NO: 230 


CHARLIE2 


TAAAATTCTGTGGGGGAAGTGGAATGGAAATACGAGTTCAAGGAGAAAAA 


SEQ ID NO: 231 


MER30B 


CAATCTrTTGGCTTCCCTGGGCCACATTGGAAGAAGAATTGTCTTGGGCC 


SEQ ID NO: 232 


MER45B 


CCGCATACGAGTTAAATGCTCTTATATTTGCATTTAAAACTGGCATTGCA 


SEQ ID NO: 233 


MER45C 


GCGAGTATCCCCGTGCCCGAGGGAGCGTGACATTAAATAGCAAATAAAAA 


SEQ ID NO: 234 


LTR25 


GTCTCCGCTGRCAGAGAGCTTTCTTCTTTCACTTAT'I AAACTTTCACTCC 


SEQ ID NO: 235 


LTR26 


TCTCAGTGTAATTGGTCTGTTACTGCGCAGTGGGCATATGAACCTGTTGG 


SEQ ID NO: 236 


HERVK9I 


ATCCCGACTCCTGCGAGAAGTAGCTCACCGTGACAAAGCTGCCTTTGCTT 


SEQ ID NO: 237 


HERVH481 


TCTCTCAAGAATACCCCAAAAATTAAGTTTTTCTTTTTCCAAGGTGCCCA 


SEQ ID NO: 238 


MER11C 


CCTGTGATCTCGCCCTGCCTCCACTTGCCTTGTGATATTCTATTACCYTG 


SEQ ID NO: 239 


MER11D 


TTCATCCCCATGTGACCATCTCACCTCATAATCAAATGACCCTAAATCCC 


SEQ ID NO: 240 


LTR10D 


GGCGACTGGCCAAGGAGAAGCACCCCTCTGCGCAGAAGTAAAATTGCTTT 


SEQ ID NO: 241 


LTR14A 


CCACACTC6CGATGGGCCCCTGGTCCCACTTTCTCTCTCAAACTGTC T I 1 


SEQ ID NO: 242 


LTR14B 


TTTGCAGCCTCCATACTTAGCGrrTGGCCCCCTGGAGCCACI 1 ICTOTCTC 


SEQ ID NO: 243 


LTR27 


GTGGGACAA6AACTTGGGAATCAGTGCACAAGCCAGACTTGGCCTGGGAA 


SEQ ID NO: 244 


LTR28 


ATTGATCCCCACCCTTCACCTATTTTACATATACCCACCCTTTCCTAATT 


SEQ ID NO: 245 


LTR29 


TTAATCAATCTGCCTTNTGrCAGiGAI 1 1 1 1 GAGGGAACCTTCAGGGGGC 


SEQ ID NO: 246 


LTR30 


GTTTTTTTCTCTCTTGGTCCGATCCGTGTCTCTCWCTCGCCGCGGGCWGC 


SEQ ID NO: 247 


LTR31 


TTTCTCTTTTGCAAAACCCATCGTCACAGTGATTGRCTTACTGCGCGCGG 


SEQ ID NO: 248 


MER61B 


ACCCTTTCCTGACTGATTCTCTCTGAATAATGCCCACCTGCGCACTGGGA 


SEQ ID NO: 249 


MER61C 


CCGACCCGCCCCACAAGTGTTrACATCAGATGCrnTGTGCAGATGAGGG 


SEQ ID NO: 250 


MER92A 


CGCTTGCCCACTGTCYCCTTTCTACTGGTTCTGCTTAYCYCTCCCTATAA 


SEQ ID NO: 251 


MER92B 


TTCTGCCTGAACTTTGAGATGCTTGCAGATCTTATGGTCAGAGCGTTCTC 


SEQ ID NO: 252 


MER92C 


TATCTACCCCTrCCTATAAAAGTCCAAGGCAAAACCACCCTGCCGAGACA 


SEQ ID NO: 253 


MER93 


GCCCTGGGTTCCTACGTAAGCAAACCGAAAGCTAACTCAGNCG'I 1 ICI lA 


SEQ ID NO: 254 


MLT1H 


CACAGATGCATGAGGGAGCCCAGCCGAGACCAGAAGAACCACCCAGCTGA 


SEQ ID NO: 255 


L1P MA2 


GAACGCAGAAACAAATGCATACATYTACAGCGAACTCATTTTCGACAAAG 


SEQ ID NO: 256 


LTR32 


ATGTAAGTCCCGAATAAACGCTATGTCTCATTTGCTGGCTCTGGGTGTCT 


SEQ ID NO: 257 


GOLEM 


GCACAACGACGAAATCGGCTAACGACGCAnTCTCAGAACGTATGCGCGT 


SEQ ID NO: 258 


ZOMBI 


TAGTGACACCTTTGCTTTCTGATGGTTCAATGTAOACAAACTTTG 1 1! CA 


SEQ ID NO: 259 


ZOMBI A 


CGGATTTTCAGATTTGGGATGCTCAAGGGGTAAGTATAATGCAAATATTC 


SEQ ID NO: 260 


ZOMBI B 


NCTGCCAGNCAACNACAGNTTGTGCACCTNGNTGGCARAGANACTGACAC 


SEQ ID NO: 261 


LTR33 


CGCTGTTGCTAGCCCCGG6GTGCTTCACCATCCCTTGTTGGI rTCCCTTA 


SEQ ID NO: 262 


L1PA12 5 


AAGTCAGCTTCAAATAAAGACCCTGCACAAAGCCTCGGCCCGGTGAAAAC 


SEQ ID NO: 263 


L1PA16 5 


GACAGCCANACAATAGACAGCCTGTCAATAGANATAGCCACACAATAATA 


SEQ ID NO: 264 


L1PBA 5 


AAGAATCTGAACAGCAGCCCTTGAGTCCCAGATCTTCCCTCTGACATAGT 


SEQ ID NO: 265 


L1PBB 5 


AATCTACCCACCTGCTTTAGCCACARCTG6TKYYTACCCAKGGAYACCTC 


SEQ ID NO: 266 


L1M3A 5 


AAGAAACATAWTCACATTCAARGGAGTCCCAATATGGGTATCAGCAGATT 


SEQ ID NO: 267 


L1M3B 5 


AGTGGMAATCTCATCAGCCCAGGGATCTRACAGGAGAAGGTCTTCGTCCC 


SEQ ID NO: 268 


L1M3C 5 


YACATCMATAGAAAAGGTCTGAGAGAGYCCCAGAATCCCTAGCCAGGCTG 


SEQ ID NO: 269 


L1M3D 5 


GTCGCGCTACGCTGATANGATTNANCATACCCTANATGCTCGGCGACTGC 


SEQ ID NO: 270 


L1MB6 5 


CACTCAGTGCGAAMAGCATTATACCTGGGGGCATTTGTTGAAAACAWTTA 


SEQ ID NO: 271 


L1MCA 5 


TGAAAGTGGACTTGGATTAGTTGTAAATGrATATTGCAAACTCTAGGGCA 


SEQ ID NO: 272 


L1MCB 5 


CTGACACCTACAGGTACAGCAAACAGTAAACACAGTCTAACTCTTAGCCA 


SEQ ID NO: 273 


L1MEA 5 


ACCACAGCCACTGGAAAGAGTGGGGAAAATGCCGGAAAGGAGAGAGCCAG 


SEQ ID NO: 274 


L1MEC 5 


ACAAAAATATCCAGCACCCAACAAGGTAAAATTGACAATGTCTGGCATCC 


SEQ ID NO: 275 


L1ME_ORF2 


TCGTGACCTTGGGYTAGGCAAWGATTTCTTAGATATGACACMAAAAGCAC 


SEQ ID NO: 276 


MER89 


AAGCTCTGAATAAATAGCC r I TGCTTGTTCTCATTTGGKTGGTCTTCATT 


SEQ ID NO: 277 


MER90 


CCTCGCTGCARCGAGCAATAAACCCAACTTGTTCAACCACAGGTGTGTTC 


SEQ ID NO: 278 


CHARLIE3 


ACAGCAACCAAAACGAGATTACGGAGTAGACTGGACATAAGCAAGACACT 


SEQ ID NO: 279 


MER91B 


ATAATGACAATTTrGCAACAGATGGCAGTAAAGTGTGTTGAGGAAGGGGG 


SEQ ID NO: 280 


HARLEQUIN 


CCTGTACTTCTTCAAATGATAAAAAGCTTCATCGGTACCTTAGTrCACCA 


SEQ ID NO: 281 


CHESHIRE 


TGCCTTCCAAGCAATGAATATGCTCAATTNAAATCATATGCTCGTGATTG 


SEQ ID NO: 282 


GOLEM A 


GAAATTGCCTAATGACGCATTTCTCAGAACGTATCCCCGTGGTTAAGCGA 


SEQ ID NO: 283 


GOLEM B 


TCCTGCAAGCTCCATTCATGGTAAGTGCYCTATACAGGTGTACCATTTTT 


SEQ ID NO: 284 


LTR34 


TGTGTCTGTGGCTCGCGTTTTTCCCGGACATGCCCTAAAGCTGGCTTAAT 


SEQ ID NO: 285 


LTR35 


CGTGTTAATTTCYATTACATGGRGAGCCCAGGAACCTGTGGTCNNTAACA 


SEQ ID NO: 286 


LTR36 


CCTGTACTTCTTCCCCCTAAGCTAGCTTTGGAA 1 AAAAAGTCACTTTCTT 


SEQ ID NO: 287 
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MLT2A2 


CAGACTGAAGGCTGCACTGTYGGCTTCCC 1 AC 1 1 II GAGG 11 1 1 GGGACT 


SEQ ID NO: 288 


HAL1 


GNAGGGATGGGGACTGCTTTTCGTNATAAGCCTTGTAGNACTAi i iiiACT 


SEQ ID NO: 289 


MER66I 


CTGGGCCCCTTAGATCAGGTATCCAGAGAI 1 1 1 1 ACTCCTCCGGTGCTAG 


SEQ ID NO: 290 


LTR37A 


TTCCTTCCCCCACTGTGGAAAAAGCCAG! 1 1 IGCNTCYAI 1 IGCAAATTC 


SEQ ID NO: 291 


LTR37B 


GGGAATGTACCTlsrrGTTGACTTTGCTATTTACTATTTGATTAGGGCCCAG 


SEQ ID NO: 292 


CHARLIES 


ACGTTTTCTCACCGATATCACACTGCATATGAACAAGCTAAAI 1 liiAAiiC 


SEQ ID NO: 293 


TIGGER5 


TTAAGGTAGGCTAGGCTAAGCTATGATGTTCGGTAGGTTAGGTGTATTAA 


SEQ ID NO: 294 


TIGGER5 A 


GGTTTCTACTGAATGTGTATCGCTTTCGCACCATCGTAAAGTTGAAAAAT 


SEQ ID NO: 295 


T1GGER5 B 


GTTTACCCTCGTGATCGCGCGGCTGACTGGGARCTGCGGYTCACTGYCGC 


SEQ ID NO: 296 


LTR38 


ATCTCCCATCTGCTAGCATTTGATTAATAAAGCTGCTTTCCTTTCACCAC 


SEQ ID NO: 297 


LOOPER 


ATGACAGTTGATGAGCAGTTAGTTGCATTCAAAGGATATTGCCCAI 1 ICG 


SEQ ID NO: 298 


HERVK29I 


GCGCCTGACAGACCTGTTGCTGCACACATCTGTACTCTTCAATCAACAAA 


SEQ ID NO: 299 


MER51I 


ACCACCCCTGGTCATTAAGGAGCTACCCTGTCTCCATTAGAHAGAGCAGG 


SEQ ID NO: 300 


MLT1I 


GAGCAGAGCCCCAGCCGACCCGCGATGGACATGTAGCATGAGCAAGAAAT 


SEQ ID NO: 301 


LTR41 


AGGGGTAGTGGCTGCTCCTTATATCTGCTATTCCTATATTCI 1 TAGAGTT 


SEQ ID NO: 302 


MER52A 


CAATAAAGCTCCTCTTCGCCTTGCTCACCCTCCACTTGTCCGCGTACCTC 


SEQ ID NO: 303 


MER52B 


TCTCCTCTGAGCTGTTCTATCGCTCAATAAAGCTCCTCTTCATCTTGCTC 


SEQ ID NO: 304 


MER52G 


AGGATGGCCAGAGGACAAAGRGGGCAGAGAGACAATGGGACWGGATGACC 


SEQ ID NO: 305 


MER94 


GCCTGGGACAGTCCTGGTTTATRCCTGTTGTCCTGGCGTAATTATTAATA 


SEQ ID NO: 306 


CHARLIES 


GAGGGGNAACCACACAAAAAGAGNAGGCTAATAAGTTGGCCAAAATAAGC 


SEQ ID NO: 307 


LTR39 


TTTCTCCCGCTGCAAAATCTCGGTGTSGATGTTTGGTTTTACTGCGCCGG 


SEQ ID NO: 308 


LTR40A 


TCTCTGACCCAGGAGTCTCGTGTCTTCTGCCAGCATCCATGAAACTGTGG 


SEQ ID NO: 309 


LTR40B 


TCTCTGACCCAGGAGTCTCATGTCTTCTGCCAGCATCCATGAAACTGTGG 


SEQ ID NO: 310 


HERVL 40 


TGCTTGGATGTCCTGTTGATAGTAGCCTTAATTAAATGCTNTATGAGACA 


SEQ ID NO: 311 


LTR9B 


GTGTCGTTTTATCTAAATCGGCGCGAGGACCAAGGACCCTGGTGTTCCTC 


SEQ ID NO: 312 


HUERS-P3 


CTCCAAATGGTGCTGCAGACCGAACCACACATAGACACGCCATTCTTCCA 


SEQ ID NO: 313 


HUERS-P3B 


GAGATSAAATCAAAATCATTGACAGGCTCAGGGAAAATGCCGGCTTCAGC 


SEQ ID NO: 314 


HUERS-P2 


TAGACACAGGNAAGAGACCTGGGAAGCTTNAGTAGCCACCGTGTAAGCCC 


SEQ ID NO: 315 


LTR20B 


TTCGCTCCAACCTCACCCTTTGTGTCCATGCTCGTTAATTTTCTTGGTCG 


SEQ ID NO: 316 


HERVG25 


CTRAGRACCCTTAAACCAGCCTCRRGARAARTCCTAACTGCTGTTNCCTA 


SEQ ID NO: 317 


LTR42 


CTTCTTTCTTTGGAATCCCAACTGGCCCCATCTCAGGANGGI 1 IGGGGYA 


SEQ ID NO: 318 


LTR43 


TTCYTTTGCAATAAATTRCTCTATGCTGCA 1 C 1 CC 1 1 1 GCTGTGTGTCTC 


SEQ ID NO: 319 


LTR44 


GTGTGTCTTCCCAGGTCAATCCTCACATTTGGCTTCCAATAAACCl 1 lAI 


SEQ ID NO: 320 


MER95 


GTCTCCCGGTTCGCGARCTGTWCTTTCTCTYATTGTATGCACAATAAACT 


SEQ ID NO: 321 


L1MC5 


TAAATGACACCATRGGGATGCAATCAGCAAAATCCAGACTGTGGGAAACT 


SEQ ID NO: 322 


MLT1J 


ATG6AGCAGAGGTGCGATACCAGCCCTGGACTGCCTACCTCTAGACTTCT 


SEQ ID NO: 323 


HERVFH21 


CAAGACATGATGCTACTCCAAGAATACCGACGGCTCCAGGAACAGCAGTC 


SEQ ID NO: 324 


ZOMBI C 


AAACTCATTTGGCAGCAAAACCTGACCTGAACTGATATGAGGCTAi i \AT 


SEQ ID NO: 325 


MER96 


AATTTAAGGAGGCACTCACTCTCAGGGTCGTGCAAGTGCAGGGTCGGCAT 


SEQ ID NO: 326 


LTR45 


GCCCACCTCCTGTCTCCTTGCTGGCCGGTTTTGCAATAAAGCCTTTCI 1 1 


SEQ ID NO: 327 


LTR46 


TCTGGCATTAAGCTGGTCCCCCACYTYYRCAGG 1 1 1 i NTGCTGGATATAA 


SEQ ID NO: 328 


MER99 


GCTTTCAACTTGATGTCAGTGGATTCCTTCGAATCAGTAATGTCTCTATG 


SEQ ID NO: 329 


RICKSHA 


"aatacggttcgtctgctcataactgttatacccgtgcgactgtcattagt 


SEQ ID NO: 330 


MER96B 


CTCAGGCTCCAGTATGAGTNGACACTGCACAGTTRCTGATCCTGTAi i lA 


SEQ ID NO: 331 


MLT1K 


TCTTGCCACCACGNGGAGAGAGCCTGCCTGAGAATGAAGCCAACACAGAG 


SEQ ID NO: 332 


HERVK3I 


CCCTTGGACCAGTCTAAAGCACCACATTAACATCTTATATGTAGTCCTTG 


SEQ ID NO: 333 


LTR22A 


CGCTGCATACCTGTGTCTGAGTACTCAnTCATCCATCGGTCGGCCAGGG 


SEQ ID NO: 334 


LTR47A 


ACACAGACGTGGCTTCTGTTTGTAAGTCCCTATTAAAIGI 1 ICTTTCTGA 


SEQ ID NO: 335 


LTR47B 


TCCTTCTGCGTTTGGGGGTCAI 1 I 1 GCATATACGGCCCI 1 ICACGAAACA 


SEQ ID NO: 336 


MER101 

IVII— 1 > 1 \# 1 


TTCGTTTTACACCGAAGGCTGCATCTCCCCGGTTTGCAAACTGTTCACTG 


SEQ ID NO: 337 


LTR48 


" CAGTTCATTTCAGCAAACCTTCAGAGGGGACAGAGGGGAA^iUi i luui i i 


SEQ ID NO: 338 


LTR48B 


TAATCATTCTCCTCTGTGATTCCCCCATGCTATGCACGTTAAAATAAATT 


SEQ ID NO: 339 


LTR49 


TGCCrrTTGTCAGTTGATTTTTCAGCGAACCTTCAGAGGGCGAAGGGGAA 


SEQ ID NO: 340 


LTR8A 


CTCTTTCTTTATTGCAATGCCATGGTCTTTGTCTGTGCAGCGGGCAGGAA 


SEQ ID NO: 341 


MER41 E 


GTAGAAGCCCCAAACCCYMTTGGCGCAACTCWCTCTCTTGAGTAT6CCCG 


SEQ ID NO: 342 


MLT2E 


TCCCCCCTCCAGACCTTCACTTCCCCAGCTCCTCCCACAATTGTATAAGG 


SEQ ID NO: 343 


LTR50 


TCTCTGTTAAAATAACTGG 1 G 1 GG rTTCTGTCTTCTCCTGACTGGACCCT 


SEQ ID NO: 344 


LTR51 


■ TCTTTGAAGAGAGAGCGCCTTTGGTCTATGCCAGAGACTATCTCTTCCCA 


SEQ ID NO: 345 


MER103 


GTGCATTGTGAATCTCCAAGAGGGGAAATATAGTATGCAGTRTTTCCCAA 


SEQ ID NO: 346 


MER104 


TTAACATCTCTGAAATCGGGATGCATCTTACAATCGATGGCATGTCATAG 


SEQ ID NO: 347 


CHESHIRE A 


ACAACGGCAGAGTTGAGTAGTTGCGACAGAGACCGTATGGCCCGCAAAGC 


SEQ ID NO: 348 


CHESHIRE B 


ACAACGGCAGAGTTGAGTAGTTGCGACAGAGACCGTATGGCCCGCAAAGC 


SEQ ID NO: 349 


HUERS-P1 


ATCTGCTCTTCGCCTTGCCCAGAGACCCCACTGTGAATTACCAI 1 IGGAG 


SEQ ID NO: 350 


LTR45B 


GTATTGGCTTCGCATCAGGCAGCAGNNAGCCCATTGATTGC M KU I AACA 


SEQ ID NO: 351 


LTR52 


ATACCCTCTTGGTGTGTGTGTGGCATCATCAGTCTTAACATCCAAACCAA 


SEQ ID NO: 352 


MER105 


GCCCTAAGGCATCCATTGTATGTAATGAATTAACTTCTCTCCTATGCATC 


SEQ ID NO: 353 


LTR53 


CATCTGTCCAGTGTTGGGTGTCAIGIGI 1 1 AKCC ATCCCCATAACCCTAG 


SEQ ID NO: 354 


LTR54 


TATAAAGCCAACCTCCTCTGCTCAGCTCATYGGAACACTCATTCIAI 1 1 1 


SEQ ID NO: 355 


MER106 


TGTGGTATTAAAATTTCATGGNGGGGGGGGGTGATTAGGAAAAAAATGTC 


SEQ ID NO: 356 


MER107 


TTCTACTTATCACTAGAGACAGAAACTAAAAACCATGGCTTCAGGCTGCT 


SEQ ID NO: 357 


MER44B 


ACTTAATAATGGCCCCAAAGC6CAAGAGTAGTGATGCTGGGATATTGTTA 


SEQ ID NO: 358 
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MER61I 


CTACTGACAGCAGGGGAGATAGGGCATACGTGGGTAGAGCGGATAATTCC 


SEQ ID NO: 359 


HERVL68 


CCCTGGAAGGCnTTCAGGTCAGCTTGAACTTACTGGCCAGAGTTGTGCTG 


SEQ ID NO: 360 


MER83B 


CGTCTTTGGAGACAGGCCCTTCTCTGCTGTGCTGCCCGTTGCAACCTTGC 


SEQ ID NO: 361 


MER83C 


GCACGTAGCCCCCTCCAGTACAAGCCTATAAAACTTCCCTCCAGCCCCTG 


SEQ ID NO: 362 


MLT1L 


GAAAGAACCTGGGTCCTTGATGATATCGTTGAGCCGCTGAATTAACCAAC 


SEQ ID NO: 363 


MLT2F 


ATCAGACGCARAGACAACAGCCTTACAGAGACTGCTTAACCAGCTCCCAC 


SEQ ID NO: 364 


LTR55 


TCATATCTTTTTCCTTGATCAGCCCCCAAATCCCTTRAACCCCCrTCACA 


SEQ ID NO: 365 


LTR56 


CTCTTTTTTGCCTTTAAAAATCCACTTGTAACTGCTGCTAATTGGAGTGT 


SEQ ID NO: 366 


LTR57 


GAGTGCCCTGTATGTAAGTCCTAATAAACTCATCTACTTATCAAGCTGGA 


SEQ ID NO: 367 


LTR58 


AGCCGCAAGCCTATTAAACCTTGCCTGAGAAAATCGGTTTGGCCTGGTGT 


SEQ ID NO: 368 


LTR59 


ATTTTTCCTRGRTGTGGCCTCAAGCTGGCTCAGTAAACCTCGATGNTTTG 


SEQ ID NO: 369 


MER4BI 


CTGANAGGATAAAGATACCTCGTGACAAAGCCTCCTGGGTATAATACTCC 


SEQ ID NO: 370 


MER50I 


AAAATGGCTTCCCTGGGTTCTTCCCTTTTTAGGCCCACTTGTTAGTCTCC 


SEQ ID NO: 371 


LOR1I 


TCCAATTACAGGTGTGACGTTTTCATTCCTCATCATTATCCCACAACGCC 


SEQ ID NO: 372 


LTR26E 


TCGGTGTATTGACTTGCCGCGCATCGGGCAACAAACCTATTACGGTCACA 


SEQ ID NO: 373 


LTR16A1 


CTGCCCTATCCTGCTTCCCTCACTCCCTTACAAGnTCTCCTGAGAGCAC 


SEQ ID NO: 374 


LTR24B 


TCTTTGGAATCTGTGYTTCCNGGGTGGNCCATCNTCAAACTTTGCACTTG 


SEQ ID NO: 375 


LTR16D 


CCCGCTCCTGGTCCCTCCCCTTTTATCTTTGACAGGNTTTCCCCTAATAA 


SEQ ID NO: 376 


LTR60 


CTTCAARAAAAATGYGACATCATAAAAACCCCGTGCAGACTCTCAGGGCT 


SEQ ID NO: 377 


MLT1E1 


GTAGGCAGAATTCTAAGATGGCCCCCAAGATTCCCACCCCCTGGTGTACA 


SEQ ID NO: 378 


MLT1J1 


TAGCCAACGGAATGTAAGCAGAAGTGATGTGCGCCACrrCCAGGCCTGGC 


SEQ ID NO: 379 


MLT1J2 


CCTGAGTCACTACNTGGAGGAGAGCCACCCACACCCGACCAGAACCCNCA 


SEQ ID NO: 380 


LTR1B 


TCRGCTRGGGRCRGTCAGAGARGAG^^TCAGCCGCTGGAYNGCCAAACTCC 


SEQ ID NO: 381 


MER109 


TGTCCRTCATTNCTGGCATNGTCAGGAGTAGGTAMGGTCTGGDCCAACTG 


SEQ ID NO: 382 


MLT1E2 


GCCCCCCAAAGATGTCCATGCCCTAATCCCTGGAACCTGTGAATATGTTA 


SEQ ID NO: 383 


LTR22B 


CACTGGGTGGTCGGCAACTGTTTACA6CACTCTCCTGGGAGTCTGTAAGC 


SEQ ID NO: 384 


MLT1G1 


TTTCCAAAGATGGCCGCAACAATATCTCCCATCCCACATGCTCTTCTTAC 


SEQ ID NO: 385 


L1MCC 5 


GCCCATTTCCAGGCATAAATACTATTTACCTCAGTCTCTACTGTTCTTCT 


SEQ ID NO: 386 


MER110 


CTCGCCTCACTGTGCCCACCAATCCAAAGCTATTATGTCATAAACTCTGG 


SEQ ID NO: 387 


HERVK111 


CAAAGAATCCTGCGTCAAAATCGAGAGAACGAACAAGCCTTCATCGCCAT 


SEQ ID NO: 388 


HERVK14I 


AATAAAAAGGCTGGACAAGATATATGGTGGAGGGATGCACATAGAAAGAG 


SEQ ID NO: 389 


HERVK131 


CAGGCGTCTCCACGGAGTCCAATGAAAAACTCGAAGCCAGCGACAAGCAA 


SEQ ID NO: 390 


HERVK14CI 


CTCATAGCTCCTATAATGCCATTGAACACCAGTGAGAGACGATTAGACGT 


SEQ ID NO: 391 


LTR14C 


ACCGCGACTGCTACACATCTTATCGAATGACTCACGAGTTCTCCTTCACT 


SEQ ID NO: 392 


LTR61 


ATCCACTGAGCTGGTGCGTACCTTAAAATAAATAACAATCCTCCTGTATT 


SEQ ID NO: 393 


HERV491 


CTCAATTTGTTTTCTCCGCICCI 1 1 GCCTATCTCTATCTAACAACCTCTA 


SEQ ID NO: 394 


HERV15I 


ATAGAGGCAGTAGTAACCCGAAACACTACGATGCTATTGACGGCATTAAC 


SEQ ID NO: 395 


LTR62 


CAAANATGrrGTGGAGCTGGTTATOTCTGACCTTGCRCTGCTCACGACACA 


SEQ ID NO: 396 


LTR64 


GGCTATAGGCNTYCCTCAGTCTACAGTCCTCAGTAAGACTTCTGAATAAA 


SEQ ID NO: 397 


MER112 


CCAGACCAGTGGCT 1 ICAAACI 1 1 1 1 1 1 GAG 1 A IGACCCACAGTAAGAAA 


SEQ ID NO: 398 


MER113 


AAGCACCAAACTGAGACTTTCTCCTTGATGTAATCAGAAGGATTGAAAGA 


SEQ ID NO: 399 


MER110A 


TTACCCAATCCTAATCAAGCCCCTACATTGAAAGACCTGCCTTAAATCAG 


SEQ ID NO: 400 


LTR33A 


CTTCTTGCTGTTGCTAATCTCTGGGTTGCCTCACCATTGNTTCCCTGi i i 


SEQ ID NO: 401 


MLT1F1 


CCCCGGCCGACATCTTGACTGCAACCTCAT6AGAGACCCTGAGCCAGAAC 


SEQ ID NO: 402 


SATR1 


ACACCCGCCCCSTACVCCCACMCCCCCTGTGATATTGTTCGTAATATCCA 


SEQ ID NO: 403 


MER115 


TTTAAATATTTAGACATATGGTATGTGGGCCTCCAI 1 1 GTACTCTTGCCC 


SEQ ID NO: 404 


MER117 


GCACAGGAGGGGGAAGTAGCAGCANATATGCTATGTATTTGCCATCCCTG 


SEQ ID NO: 405 


MER20B 


TAGGTGCAAGCATCTGACTACTTCATTATGTCTTCTAGTGTAGTCATGCC 


SEQ ID NO: 406 


LTR65 


TCCATGGTTCCTCTGGTGTGCAGTCTCCCTCATTGCAATAAGTCAATAAA 


SEQ ID NO: 407 


LTR38B 


TGAAGYGGTTGCTTTGGATAGGAATCYGGCCRGTTCCCCATTAOIAGI 1 1 


SEQ ID NO: 408 


CR1 HS 


GGATTGACAGCAGATCAMGGGAAGTGATTATACCCCI 1 lACAATGCCTTG 


SEQ ID NO: 409 


L1ME4 


GTGGGATGGACAGGGATGGGAGGGACTGACTTTTCACTGTAiACCI 1 1 1 1 


SEQ ID NO: 410 


MLT1H1 


TGGACCCTCCAGACCAGCCCATCTGCCAGCTGAATACCACTGAGTGACCT 


SEQID NO: 411 


LTR2B 


GGGACAGAAATTGTGCACTCGGGGAGCTCGGA 1 TTTAAGGCAGTAGCTTG 


SEQ ID NO: 412 


MER101B 


CCAGAAACCACCTCCCCACAAGCCCACTAGAAACAAACATCTGACAGAGA 


SEQ ID NO: 413 


MER45R 


TAGCCNATAAAATACTCTTAACAGCTCCAGNAACAGTTGCATCAGCAGAA 


SEQID NO: 414 


MLT1G2 


TTTAAAACATGGCCGCAAAriCI 1 1 GACACTCCTCTCATTGAGANGTGG6 


SEQID NO: 415 


MSTA1 


CTTGCTTCCTCTCTCACCATGTGATCTCTGCACACGCTGGCTCCCCTTCC 


SEQ ID NO: 416 


LTR6A 


GAATTCGTCTCAAAGl G } GGCG 1 1 TCTCTATAACTCGCTCGGTTACAACA 


SEQ ID NO: 417 


L3 


GGTCTGGAAACCATGTCATATGAGGAACGGTTGAAGGAACTGGGGATGTT 


SEQ ID NO: 418 


LTR66 


TGCCATTTACGTGGGATAAAGCTTGTTTACCCTTAAAGGTATTGTGTGTG 


SEQ ID NO: 419 


PR1MA41 


ACCTTTTGTCGGAACTCGGAGTTATGAACGACCCTCACCATACCGATGCT 


SEQ ID NO: 420 


MARNA 


TATNGCCTCCCAAGGTGACTACTTTGAAGGGGACAACACTCATTTGGATG 


SEQ ID NO: 421 


MER119 


TTACTGAGACACTAAGGGCGGCGTGAACCGAGAAAGTTTGGGAACCTCTG 


SEQ ID NO: 422 


LTR67 


" ■ GTTCTCCAGCCCTCCCGGAGATTCTGTGAGCTACCCAATATCCTTTAATA 


SEQ ID NO: 423 


L1M3DE 5 


CGGGCNGATTGGTGAGATCCNTCTCCTACACGAGGCCAGTCTGACAAGAC 


SEQ ID NO: 424 


RICKSHA 0 


CTCTTATGGACTATCTCCGTGCAATTGCCCATAATCTATCCCTGTAATAT 


SEQ ID NO: 425 


MER4E 


AGGGGTCTGGGGAGTCATGCCCTACAAACCATAAATTCTCATCAGATGGG 


SEQ ID NO: 426 


MER104A 


ACCTTTCGCGTTTCAGTTAACAAACCAI 1 lAAGGACCA l T IGAGGAAGGA 


SEQ ID NO: 427 


LTR40C 


TGCTCATGCTGCTTGCTGTGYCATGAGTAATAAAGICGi 1 IGICTCTGAC 


SEQ ID NO: 428 


LTR54B 


TGGTCAAGCTACTTTACAAAAGCCAAACTGGTCTGCGATGCCCAGCGGAG 


SEQ ID NO: 429 
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MIR3 


GGAAGCAGTATGGTATAGTGGAAAGAACAACTGGACTAGGAGTCAGGAGA 


SEQ ID NO: 430 


MLT1G3 


CCAGCTGTGAAGTCATCCXCAGGCTCTNNCAGYCMTCCCCAGCCTTCAAG 


SEQ ID NO: 431 


MSTA2 


CCACTTCCCCTTTGACCTTCTCTGCCATGTTATGATGCAGCATGAAAGCC 


SEQ ID NO: 432 


UMD1 5 


TTTGAGAACTGAACTAAAGGATAGACCACTACCCAGGTCCCAGACTGGCC 


SEQ ID NO: 433 


LTR10E 


ARTGCTAATTTTTCTTTGCAGCACCGAGGAACAAGCATTCTGTTTCTAAA 


SEQ ID NO: 434 


LTR24C 


TCTCTGGAGTCTGTGTTTCCTGAATGGCCATTCCCAGCTTTTNACTTGAA 


SEQ ID NO: 435 


MLT1C1 


TGGAGTGATGCAGCCATAAGCCAAGGAATGCCAGCAGCCAAGCCACCAGA 


SEQ ID NO: 436 


MSTD 


GTGGGTTTGTTATAAAAGNAAGTTCGGCCCCCl 1 1 TGCTCTCTCNCTCTC 


SEQ ID NO: 437 


LTR68 


ATGTTTACGTCATATACATTTCCATGTCTCAGGAGGCTAGGGCTTTTTAC 


SEQ ID NO: 438 


L1MED 5 


TAAAAACCCAGTGGATAGGTNAAACAGCAGATTAGANACAGCTGAAGAGA 


SEQ ID NO: 439 


L1ME5 


ACTGAAAGGAAATATACACCAAAATGTTAACAGTGGTTATCTCTGGGTGG 


SEQ ID NO: 440 


TIGGER6A 


TAGAAGAAATAGCTGACCGTGGGAATGTTGACACTGCCGCCAI 1 IGAGAG 


SEQ ID NO: 441 


MER51C 


AGACCAAATCCTTGATCCAGATAAGGGGTAGCCAATAGGAACCTCAAAAG 


SEQ ID NO: 442 


LTR6B 


CCGGCTAAATAAACGGACTCTTAATTCGTCTCAAAGTGTGGCG7TTTCTC 


SEQ ID NO: 443 


MER21A 


TCCACAGTTCCTGGCTCATAACTCCCATAGCCCTTGTTACAGTCTTTTGT 


SEQ ID NO: 444 


MER34B 


CCACAAGTTGCTGCCCCTAGAGACTCAAAGTCCTTTTCCTTTGTCTTGTC 


SEQ ID NO: 445 


LTR3B 


AGTTTCTTTTGTCTTAAGTTTTCATTTCTGCGTTCGTCCCCCTTCGTTCA 


SEQ ID NO: 446 


MER54A 


AGGCGGTTGTATAAGGCAGATATCTGGATGGACCACATTGAGGAACTGGG 


SEQ ID NO: 447 


MER74C 


GCCTTTCATCTATCCGAGTGTCANTGTGTTGTGTCCCGCCATCAAAAGAA 


SEQ ID NO: 448 


ERVL 


AAGAGTAAACATCACTCAAGGACTTTACCTCCTCTTGTGGGGAAGGGGTT 


SEQ ID NO: 449 


HERVL74 


AAATACCCCNAATAATTGATGTCAAAACTGACGTCAAGACANAAAGGGGT 


SEQ ID NO: 450 


MER83A1 


TAAGTCCCAACTCAGGGATTTAGGTCGACGTAAGCTCCTGACCGACTAAC 


SEQ ID NO: 451 


MER83BI 


TCTCCGATGAGTTCTTTCCTGCAGCAAGATGGAATATCCTAAGTGCCAGA 


SEQ ID NO: 452 


MER84I 


ATTTTCCCTTTCTTGAGACCCCAATAGGCAGCAGGTAGACATGAGCATGG 


SEQ ID NO: 453 


LTR75 


TAATAAACTGTCTGAATCTAAAAGTGGCTCGTTGTATCTTTACCAGCCGA 


SEQ ID NO: 454 


L1PA7 5 


CACCGAGCTAGCTGCAGGAGrniTI 1 1 1 1 1 CG rACCCCAGTGGCGCCTG 


SEQ ID NO: 455 


L1PA13 5 


CTTTAGCCCTAGGGGAACTGTCGGACCTGAACTCTGCAGGGCGGTCTTGC 


SEQ ID NO: 456 


L1M1 5 


AAGAAACAAATAACATACAATGGAGCTCCAATACGTCTGGCAGCAGACTT 


SEQ ID NO: 457 


L1M2A 5 


CATGTCAGACCCGACACCAAGAGGGATCCCCTCGGCTAAGTCTCCCCATT 


SEQ ID NO: 458 


L1M1B_5 


CCGATTCGGGACGGGCAGCGCTCTGATTGTTTACTAGAGCGGAGGCAAAC 


SEQ ID NO: 459 


L1MB3 5 


AAAGGGGTGGGGATGGAGCTGTAAAGGAGCAGAGTTTTTGTATGTTATTG 


SEQ ID NO: 460 


L1MDB 5 


CACAAAAGTAGGCCAGGACCTGCATGCTAAACCTAAACAGGGTGACTGCC 


SEQ ID NO: 461 


L1HS 


CACAGGAAGGGGAATATCACACTCTGGGGACTGTGGTGGGGTCGGGGGAG 


SEQ ID NO: 462 


L1PA3 


AACACATGGACACAGGAAGGGGAAGATCACACTCTGGGGACTGTTGTGGG 


SEQ ID NO: 463 


L1PA4 


AACACATGGACACAGGAAGGGGAACATCACACAGCGGGGCGTGTTGTGGG 


SEQ ID NO: 464 


L1PA5 


GAACACTTGGACACAGGAAGGGGAACATCACACACCGGGGGCTGTTGTGG 


SEQ ID NO: 465 


L1PA6 


GAGAAATACCTAATGTAAATGACGAGTTGATGGGTGCAGCAAACCAACAT 


SEQ ID NO: 466 


L1PA8 


AGGACAAATACCTAATGCATGCGGGGCTTAAAACGTAGATGAGGGGTTGA 


SEQ ID NO: 467 


L1PA10 


ATA6CTAATGCATGCTGGGCTTAATACCTAGGTGATGGGTTGATAGGTGC 


SEQ ID NO: 468 


L1PA12 


CTTAATACCTGGGTGATGAAATAATCTGTACAACAAACCCCCATGACACA 


SEQ ID NO: 469 


L1PA13 


TACCTGGGTGATGAAATAATCTGTAGAACAAAGGCGGATGACACAAG 1 1 I 


SEQ ID NO: 470 


L1PA14 


GGGAGAGGAGCAGAAAAGATAACTATTGGGTACTGGGCTTAATACCTGGG 


SEQ ID NO: 471 


L1PA16 


TGGGTGATGGGATCATTCGTACCCCAAACCTCAGCATCACGCAATATACC 


SEQ ID NO: 472 


L1PB2 


ATCTCAGAAATCACCA0TAAA6AACTTATCCATGTAACCAAAAACCACCT 


SEQ ID NO: 473 


L1PB4 


KTACACTAAAAGCCCAGACTTCACCACTACGCAATATATCCATGTAACAA 


SEQ ID NO: 474 


L1MA1 


ATTCTCCATGATGTGCTTATTTCACATTGCATGCCTGTATCAAAACATCT 


SEQ ID NO: 475 


L1MA3 


GCTGGGAAGGGTAGTGGGGTGGGGGGGAAGTGGGGATGGTTAATGGGTAG 


SEQ ID NO: 476 


L1MA4 


GGAGGGGGGGAATGAAGAGAGGTTGGTTAATGGGTACAAAAATACAGTTA 


SEQ ID NO: 477 


L1MA4A 


GAGGACTTGAAATGTTCCCAACACATAGAAATGATAAATACTCGAGGTGA 


SEQ ID NO: 478 


L1MA5A 


TGGGAAGGGTAGGGGGAAGGGGGGGATAGGGAGAGATTTGTTAAAGGATA 


SEQ ID NO: 479 


L1MA6 


ATAGGAGGAATAAGTTCTGGTGTTCTATTGGACAGTA6GGTGACTATAGT 


SEQ ID NO: 480 


L1MA7 


ATGGGGAGATGTTGGTCAAAGGGTACAAAGTTTCAGTTAGACAGGAGGAA 


SEQ ID NO: 481 


L1MA8 


TGCTNATGGTCGGATGACTGGCCACTCTGTGAACACAGTAAACAAGI 1 IG 


SEQ ID NO: 482 


L1MB1 


GAAATGGGGAGTTGCTGTTCAATGGGTATAAAGTTTCAGTTATGCAAGAT 


SEQ ID NO: 482 


L1MB2 


GGGTATAGAGTTTCAGTTTTGCAAGATGAAAAAGTTCTGGAGATCGGrrG 


SEQ ID NO: 484 


L1MB4 


TGGTGATGGTTGCACAACAMTGTGAATGTACTTAATGCCACTGAATTGTA 


SEQ ID NO: 485 


L1MB5 


AGGGGGAATGGGGAGTGACTGCTTAATGGGTACGGGGTI ICCI 1 1 IGGGG 


SEQ ID NO: 486 


L1MB8 


GGAATGGGGAGTGACTGCTAATGGGTACGGGGTTTCTTTTGGGGGTGATG 


SEQ ID NO: 487 


L1ME1 


GGTGGGGGNAGGGGATTGACTACAAAGGGGCATGAGGGAACTTTTTGGGG 


SEQ ID NO: 488 


L1ME3 


ATAGTGGTTACCTTTGGGGAGGGTTATTGACTGGGAAGGGGCATGAGGGA 


SEQ ID NO: 489 


L1ME4A 


GACTGGAAGGAAATACACCAAAATGTTAACAGTGGTTATCTCTGGGTGGT 


SEQ ID NO: 490 


L1MC1 


TTGATAGTGGGGGAGGCTGTGCATGTGTGGGGGCAGGGGGTATATGGGAA 


SEQ ID NO: 491 


L1MD3 


ACCCATAACCCCAGTCTAATCATGAGAAAACATCAGACAAACCCAAATTG 


SEQ ID NO: 492 


HAL1B 


AGAGGAGAGGTGGAAGGAAGTATGAGAGTGCTAATNTCCTCATCI 1 iCAT 


SEQ ID NO: 493 


L1MA9 5 


AGACCCAGGGTTCAGGCCTGTCCCAGTAGACCCCAGCACTAGGCTAGTCC 


SEQ ID NO: 494 


L1MDA 5 


AAGAAGGAATCTTGGAACATCAGGAAGGAAGAAAGAACATAGTAAGAAGC 


SEQ ID NO: 495 


L1MEB 5 


GGCAGAAACTGGAGGGGAGTCGACACCTGGAAGAAGGGAATWGCACGGAG 


SEQ ID NO: 496 


TIGGER5A 


TTAAGGTAGGCTAGGCTAAGCTATGATGTTCGGTAGGTTAGGTGTATTAA 


SEQ ID NO: 497 


TIGGER6B 


AGGCAACCCCATCAAGAACTTANGCGAAAAAAGATGTAGGATCACAAAGT 


SEQ ID NO: 498 


TIGGER7 


TCGGATGGAACGCAGCATTAAAGTCACCCATATGATCAATGAAGGATTAC 


SEQ ID NO: 499 


MER44D 


CCTCACTTCATCTGATCACGTAGGCAI 1 1 lAICATCTCACATCATCACAA 


SEQ ID NO: 500 
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MER69C 


ATCGACGAAGATAACATAAAACTCATAATACGCCACTACAACGAGGACAT 


SEQ ID NO: 501 


MER106B 


TATTTATGTTTGATCCTCAGTGCTTTGTGTGACTTGGGCl I iGAGAATTA 


SEQ ID NO: 502 


CHARLIE2A 


GATTGGTTTGACAATGAGGACTGGCI 1 1 GCCAATTAGGTTATATGGCAGA 


SEQ ID NO: 503 


CHARLIE2B 


TTAATNCACCTTTTGTAAGCCCTATACTTACTAGTGGCCCAATACCTTCT 


SEQ ID NO: 504 


CHARLIE? 


ACTTAGAACCAGACCTTCGAATCGCTGTATCACAAAGTGTTAAACCAAGA 


SEQ ID NO: 505 


CHARLIES 


ATTTATGTTACCTGCCTGGCCCCTGTAGGCAI 1 IGAGI 1 IGCGACCCCTG 


SEQ ID NO: 506 


CHARUE8A 


ATTTATGTTACCTGCCTGGCCCCTGTAGGCAI 1 (GAG I 1 IGCGACCCCTG 


SEQ ID NO: 507 


MER63D 


ACAATGTAACGGCTACAGACACGACACACI 1 1 lAAGI 1 lAAICTGCATTA 


SEQ ID NO: 508 


MER97A 


TGTTAAAAAATGATCCGCTCTGGGTGTCGAATACGCTAGGTACGCCACTG 


SEQ ID NO: 509 


MER97B 


CCAGTGGTATGhrrTTWGTAGTTGCCTAAATTGTACCTTTTGCAGACGI 1 1 


SEQ ID NO: 510 


MER97C 


TGTTAAAAAATGATCCGCTCTGGGTGTCGAATACGCTAGGTACGCCACTG 


SEQ ID NO: 511 


MER6B 


GTTCTTGGAAACTGCGACTTTAAGCGAAACGACGTACAGCAGGTCCTCGA 


SEQ ID NO: 512 


ZAPHOD 


ATTGCCGGCCCATCAACAGAACACCCAGACATGTGCAATAATAATTAAAT 


SEQ ID NO: 513 


T1GGER9 


GCCAGTCAGATTTCACGGCANTGCCAAIGI I 1 G IGTCTGTACAGCGNTGT 


SEQ ID NO: 514 


HERVL661 


CTCCTGTGCTTACCCTGTATCTGTAATCTATATCAACTATGCCTTCCCCA 


SEQ ID NO: 515 


THE1A 


TTTATCAGGGGTTTCCGCTTTTGCTTCTTCCTCATTTTCCTCTTGCCGCC 


SEQ ID NO: 516 


THE1C 


GTGTCCCCACCCAAATCTCATCTTGAATTGTAGTTCCCATAATCCCCACG 


SEQ ID NO: 517 


MSTB 


TGTTAGTTCACGCGAGATCTGGTTG I ITAAAAGAGTNTGGCACCTCCCCC 


SEQ ID NO: 518 


MSTB1 


CTTCCTCTCTCGCCATGTGATCTCTGCACACGCCGGCTCCCCTTCACCTT 


SEQ ID NO: 519 


MLT1AR 


TCAGTCTGCTCCCTATCTTCGGCTGCCCG 1 1 1 AGIMTGTGGCTCAAGTGGG 


SEQ ID NO: 520 


MLT1CR 


AAGGTGCGGCCTGGI 1 ICTCCTTGCTGCTTATAGTAAAATGCGAGAGGAA 


SEQ ID NO: 521 


MER104B 


CCTTTCGCGTTTCAGTTAACAAACCATTTAAGGACCATTTGAGGAAGGAA 


SEQ ID NO: 522 


MER104C 


TGAAGGCAGGAGAAATTGCCNAATCCCNCGGAATAGATGAAAGAAAl I iC 


SEQ ID NO: 523 


HSTC2 


TNATGTAGACTCCTTCGCAAGACTCCATCAGCGAACCAnTGACACI i I I 


SEQ ID NO: 524 


L2A 


ACGCTCTTCCCCCAGATATCCACGTGGCTSGCTCCYTCACCTC^ATTCAGG 


SEQ ID NO: 525 


ion 


CCTGCCACTCTGGGTTATMATTGTCTGTKNGCANGTCTGTCTCCCCCACT 


SEQ ID NO: 526 


MER51D 


TTTGTTTGGGACACCAAGAGCCTGGAACTGCACRGCACCAKCTGGTAACA 


SEQ ID NO: 527 


MER5C 


TGGACCAGTGCTAGTCTGCAAAC'I'GrTTGTTACCAGTCCATGATAAGATA 


SEQ ID NO: 528 


HERVK11DI 


CCCGGTGCTGAAGTTTTAGACGGTATCTCTGAGGGGTTATCTAATCTCAA 


SEQ ID NO: 529 


LTR69 


GAAAAGTCGCCCCTGGGGAAGCTGGTTAACTAGGACCACCCAAGACCCCC 


SEQ ID NO: 530 


HERV301 


AAAAAAGGAGCTTGAACACTCAGAACCCTGAAATATGTTTAACCAATGGA 


SEQ ID NO: 531 


HERV19I 


CATAGCAGGAATAATGGTTACTAACAGAAAATAACACATGGGCCi i luCA 


SEQ ID NO: 532 


LTR19C 


TCACTCTGTGTGTGTGTGTCCGCGACCTCGATCTCCTTGGCCGTGAGACC 


SEQ ID NO: 533 


HERV461 


ACCCACTGCTTCAAAACCCAAACCCTGATTACAGGNCCCGTATTCGGCAG 


SEQ ID NO: 534 


HERV52I 


TNAATAAGACATGGCACATTTCAGTCATCCATCAAAGATCAGGGGTGAAT 


SEQ ID NO: 535 


MER89I 


GCTTCTGCGCAGCCGCTCTCTCATCAGATGATCGCCATGATGATACAACA 


SEQ ID NO: 536 


MER110t 


GACAATGGTCTNTCCTTCAGNTCGGGNTGAAGAATGACCAAAGGAGAAAT 


SEQ ID NO: 537 


MER21I 


ATCCTTGTTTCGNTGTAAGGGATTCAGTGGTTGGAAANCAGGGAGTGGCC 


SEQ ID NO: 538 


PABL Al 


GCGCTCAAAGGGTGAGTTAACTGGATCGTATGCCGGGAGCOlAl llsi 1 1 i 


SEQ ID NO: 539 


PABL Bl 


CTCGCGGTGCTGGCCATCCTTGNAGGCATGGGCATAACGTTATGTTGTGG 


SEQ ID NO: 540 


MER52AI 


ACNCCCANGGGATTATCTACTCCCCTAAACAGCTATCTCTCTTCTAAAGT 


SEQ ID NO: 541 


HERV57I 


AGCCATGGCTATACGTTATAGACCTGTATAGTTCTTCCCCTCATACCCTA 


SEQ ID NO: 542 


MER701 


GGGCATATGAAATGGACTAGCTTTGCTAAGGGGGATATCTGGGTTGGGGG 


SEQ ID NO: 543 


HERV38I 


CGGGATCGGTTTGGAGTGCTCCGTCTGCATCGGATCCGTCTGTGi i iGTG 


SEQ ID NO: 544 


L1M2B 5 


■CTTTCCCTACCCACTGCGACTACNYCTGACTCTGGGGCCAAAGCACATGC 


SEQ ID NO: 545 


L1M2C 5 


ACACCCCAATGAACTGACACCAAGACCCAI i lAIACAAAlAAiii i i nuL^ 


SEQ ID NO: 548 


HERVFH19I 


CTGGAGCAGTCCTCCAAAATAGAC6GG6ATTAGATCTTATAACGGCTGAA 


SEQ ID NO: 547 


HERV70 1 


CTCAGTGGCAGATGGTAGAG(3rGAAoAGAeisANow\LfAU I aoumm^umvjo 


SEO ID NO' 548 


LTR70 


TCTTTGCTCCCAGGTTAYAATCCTNAAGCTTGRCCCAAATAAACTGTCTA 


SEQ ID NO: 549 


MER120 


AGATGTGGATACTCAAGATrrCTATTGGGGAAAACTGTGGTCCTTAGTAA 


SEQ ID NO: 550 


REP522 


TGTATTGCTGGCAGCAGTGAGGTGGGTTAAGGGTGCTATCCGGGGCTGCA 


SEQ ID NO: 551 


LTR71A 


TTAAAAGTCTCGCTTCCACTGTTCTTCGTGTCTCTGAGTCCATICI 1 IGG 


SEQ ID NO: 552 


LTR71 B 


CATTAAAAGTCTCACTTTCGCTGTTCTCCGGGTCTCTGAGTCUAl lUl 1 1 


SEQ ID NO: 553 


LTR12B 


CCCACCAGAAGGAAGAAACTCCGGACACATCTGAACATCTGAAGGAACAA 


SEQ ID NO: 554 
SEQ ID NO: 555 


MER121 
MER122 


AGCACTTTTTTCCCCCCTTAATTTTTAAACCCATG1GTAI 1 IGAAGGGAA 
TGCAGTTGGTGGCGACAGAGACTGTAGTGTGGCTGGAGTGGTAGGAAGGG 


SEQ ID NO: 556 


LTR7A 


AAAGCTTTATTGCTCACACAAAGCCTGTTTGGTGGTCTCTTCACAGGGAG 


SEQ ID NO: 557 


LTR7B 


ACAGCCTTGTTGCTCACACAAAGCCTGTTTGGTGGTCTCTTCACACGGAC 


SEQ ID NO: 558 


MER51E 


GATTAGGCAGCAYACAGGCCACATCCTCACTCCTGTGATAACAAGACAGA 


SEQ ID NO: 559 


MER41F 


CAGGAGAATAGAAAATTCCAGGCAGCAGTTTCACATGACTAGCAAAAGGA 


SEQ ID NO: 560 


LTR2C 


AAGATAAATAGCCAGACAACCTTGGCACCACCACCYGGCCCTAGGAGTTA 


" SEQ ID NO: 561 


LTR38C 


ACACCTCACTCTTGTTATTTTGGCTTCTTTCTACAAGCGGCAAGCAGGYG 


SEQ ID NO: 562 


LTR72 


AACCTGTATTCTCATGGAGAGTCGTTTGTTACTCACCAGGYGAATRAACC 


SEQ ID NO: 563 


MER65D 


TAAAAGCTTCCCTTTACCCTCCCCTCTTCAGATGCATCTGTGGCTTGCCA 


SEQ ID NO: 564 


ALR1 


TGAGGCCTTCGTTGGAAACGGGATTTCTTCATATAATGCTAGACAGAAGA 


SEQ ID NO: 565 


LTR1C 


GGTTCCAGCATTCATTCGCTCCGGTTCCCGCACTCACTCGCTTGCATGCT 


SEQ ID NO: 566 


LTR45C 


TCTCACAAGCAGAGGGAGI 1 1 CAGCATTTCAGCAAGTI G 1 ITCl M lufT 


SEQ ID NO: 567 


LTR76 


GATGTTAAGTCTGCTGGGTCTGAGTGCACTCAATAAAAGATCCTCCTGTT 


SEQ ID NO: 568 


MER72B 


" TTTCACAATGCATCCCTTCCTAAAAACTGACCACCATCrCTGGACTGGTT 


SEQ ID NO: 569 


ALR2 


GTGAAGGGATATTTGGGAGCTCATTGAGGCCTATGGTGAAAAAGAAAATA 


SEQ ID NO: 570 


LTR1D 


GTTCCA6CACTCATGCACTCCAGTTCCCACCTCGTTGACTCACATGGTCC 


SEQ ID NO: 571 
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TCCTGGTCACCTCCCCATAACTGGCCTTCCCCACACCGTTCl Hui i iGT 


SEQ ID NO: 572 


MPR50B 


ACTCCCTAAACACACTGCGCGTGCTCAATTCCCAAGGGTAAGGAGGGCAC 


SEQ ID NO: 573 


HERVP71A I 


AATTGTGGCAGGAGTCTTAACAGCAGTGGGATGTTGTATTATCCCTTGTG 


SEQ ID NO: 574 


LTR27B 


TTTGCCCACCCTTTCCCGAnGAI ICITTCTGAATAATGGCI 1 I lAACCA 


SEQ ID NO: 575 


LTR12C 


CACCAGAAGGAAGAAACTCCGAACACATCCGAACATCAGAAGGAACAAAC 


SEQ ID NO: 576 


LTR43B 


CAGTCGGTGCTGTCTCACYYTTGAGCAGCCNYGCTCTGACTCAGCTGTGA 


SEQ ID NO: 577 


LTR72B 


CCCTTGTTAAATCCTCGrrGGTTGTGGTCATTGGACTGTCACCTGCCAAG 


SEQ ID NO: 578 


LTR77 


GGGACAAGAACTCAGACCTTGCTAAACTAAGGAGTAAGAAGACTGCAACA 


SEQ ID NO: 579 


L1PREC1 


GTCAAAGTGCTTCATTAAATGGGTCCTGTTCCCTGTGCCACCCAACTGGG 


SEQ ID NO: 580 


MER2B 


TCATTCAGGTGGATTCAATGTAGTACTYGGTGTATGGCAAATTCAAGi I I 


SEQ ID NO: 581 


MER93B 


CTATAAAAGCCTCCCCCTTGCATTCCCTCGGTGGAGCTCCCGAACCACTT 


SEQ ID NO: 582 


SATR2 


TGTACACCCTGTGATATTATTCGTAATATCCTAGGGGGATGTTACTCCTA 


SEQ ID NO: 583 


ROLEM C 


GGGNAAATGANTGATATTCAGTAATGGTGCTGGGACATTTGGI 1 1 ICCAI 


SEQ ID NO: 584 


MLT1A1 


CCCCTCTAGAGGATGCAGCATWCAAGGYGCCATCTTGGAAGCAGAGASCA 


SEQ ID NO: 585 


L1PREC2 


TGGCTGAACACTCCCAGTAACAGTGGC 1 C 1 GUGTTTCTCGGAGGTGGAGC 


SEQ ID NO: 586 


BLACKJACK 


CATCCAAACAAGCTGCGATATTCTACCCAACGATATAGAAGCTGTAGTTG 


SEQ ID NO: 587 


I1M2A1 5 


GCCCACCCAACCCATCACAGCTTCCAGCAACACCAACATGGACTGCTTGG 


SEQ ID NO: 588 




TGGAAGAGGATTCTAAGCCTCAGATGAGAACACAGCCCTAGCCAACACCT 


SEQ ID NO: 589 


MER4E1 


TTCTTCCAGACCCTCCGAATCCTAAAGAGATTAACTAAGATCTGAATAGG 


SEQ ID NO: 590 


PRIMA4J 


CGTGACCTCCTAGGAATGAGCCTTCCTAGTGATGTGGGACCTAAACTTCT 


SEQ ID NO: 591 


PRIMA4 LTR 


TTTAAATTTGGAGCCCTGAAAATCATCTTCGGAGAAAGGCATAGACCTGT 


SEQ ID NO: 592 


L1M4B 


AAAACAANCACNANGAGCCGGGGGNGGGGAATCAGTATCCAGAGTTGCTA 


SEQ ID NO: 593 


L1PA14_5 


CACACAGACAGCAGATTAGGGCTAACCTGGCAAGGATACAGCTTGTCTGC 


SEQ ID NO: 594 


LTR13A 


"TCTCTTTGTCTTGTGTCT T I AT7TATTACAATCTCTCGTCTCCGCACACG 


SEQ ID NO: 595 


HALIC 


AACCACAACATNAGAGGACCCANCACTCCTCCTACGACGAAAACAAAACC 


SEQ ID NO: 596 




AGAGGCTCATAGAAATGGCACTTACTAAAACCTCCCTTAACTATCCTCCA 


SEQ ID NO: 597 


MLT1F2 


CNGATCCTCCCCTCNAGTTGAGCCTTGAGATGAGACTGCAGTCCTGGCTG 


SEQ ID NO: 598 


MLT1FR 


TTTGGACCCCCAAAATTCTACTGGCAGGAAGCAGGCTGAGAAAACTACTC 


SEQ ID NO: 599 




CAGAGGCTCATAAAAACGGCACTTACTAAAACCTCCCTTAACTATCCTCC 


SEQ ID NO: 600 


1 TR10F 


TTCCCTCCCTTGTCCAGGTGTGCGCTCACCATTGCTCCATCTGTGA6GGT 


SEQ ID NO: 601 


IV* c r\o*T 


CTAAAGACACTTTGTGCTCAGACCTAGAAATCTTCTCAATTG6CTGCCAT 


SEQ ID NO: 602 


MFR57A 1 


CTGGAAGGCCTATGCACCTAATAATAGAACCTCATGTATCTTCCGCTACT 


SEQ ID NO: 603 


PRiMAX 1 


AATTAACCAAGGCrmAAAATTCCTTGGCCAAAAGCTCTTCCATTGGTT 


SEQ ID NO: 604 


MER75B 


CATTTCCCGTTTGCCCCAAGAATACTCTTGTCTCTAATCCTAATGTAACA 


SEQ ID NO: 605 


MLT2B3 


CCCAGGTGGTTTGGCATTTGArrAGAATGATTGGGCTGCCCCAGGTGTGT 


SEQ ID NO: 606 


MER66C 


AGGATCTGGTCC^GACAGGATAAAGTGAAGAAACNRGCAGGAACCAGCAG 


SEQ ID NO: 607 


MER52D 


CACNGCTCCACACCTGRGTTNNCCTTGGCAGGNNTGGATCNAGGNCCTTG 


SEQ ID NO: 608 


ivic.r\*T i vj 


TGCTTTGCAATAAAAGCTTCnTGCCTTTCGCTTCATTCTGACTCATCCCT 


SEQ ID NO: 609 


MFR21C 


AGGAGCATCnTTGTTCTAATATTTGGrCI 1 1 GACCGTAGTTCCTG ACAC 


SEQ ID NO: 610 


1 TR2QC 


CCAACCTCACCCTrTGTGTCCATGCTCCTTAATTTTCTTGGTTGTGAGAC 


SEQ ID NO: 611 


1 1PRA1 S 


TCTGTTTGCGGGAGAAGTTTCTGACTrTACCTGGAGCTGAGTCAAKll lAG 


SEQ ID NO: 612 


1 iMRA ^ 


AATCTCATGTCAAAAAAACACTAGCTGAAGACAAGCTAAGGAACAGAGAC 


SEQ ID NO: 613 


I. 1 i\f o 


TTGACACTCACTTTCGGTTTTGTGTATTGGCTTCGTGACACCAAACAGGG 


SEQ ID NO: 614 


MARi FDI IINI 

TR 


GGGAGGAGACCACCCCTCATATTGTCTTATGCCCAATTTCTGCCTCCAAA 


SEQ ID NO: 615 


LTR12D 


CACCAGAAGGAAGAAACTCCGGACACATCTGAACATCTGAAGGAACAAAC 


SEQ ID NO: 616 


LTR12E 


CACTCCTGAAGTCAGCGAGACGACGAACCCAGGGGGAGGAACAAACAACT 


SEQ ID NO: 617 


MLT2B4 


GTAAGAGAGAATTCCTCCTGCCTGACIGCCI 1 1 GAACTGGGACATCGGTC 


SEQ ID NO: 618 


MER9B 


'TAACAACATGTTTTTGCTGCAGATAATCAGCCAGAGCC"IGI I ICICTRCT 


SEQ ID NO: 619 


SVA2 


GAAGTGACAGCCTTGTGTGTGATCTTTGTGCCCTCCCCAAGi i iviUAi i i 


SEQ ID NO: 620 


HERV39 


TCTTGCTGCTAAAACTGCATACAACAGCCACCCAGCCAAGAGGAATTAAT 


SEQ ID NO: 621 


MLT1 H2 

IVII— 1 1 ft£t 


CCCAGCTGCCATGCTAAAAGAAGCTCAGGCTAGACTATTGGATGATGAGA 


SEQ ID NO: 622 


1 TR10G 


GCTGAGAAAACTTTTGCCTGAGTGCIGGI 1 ICACI 1 1 GCGGCACCAAGCA 


SEQ ID NO: 623 


MFR4A1 


CAGAAACTCAAAAGAATGCAACCATTTGTCTCTCACCTACCTGTGACCTG 


SEQ ID NO: 624 


fUiCRAni 


CTCTAGTATAGCATCACATGACAGATAGCAGGCCCTGAAAGAAATCAAAG 


SEQ ID NO: 625 


THFID 


CNTCTCTCTCCTGCCGCCTTGTGAAGAAGGTGCTTGCTTCCCC! 1 IGCCI 


SEQ ID NO: 626 


LTR5B 


CCTCCGTATGCTGAGCGCCGGTCCCCTGGGCCCACIGI ICI ITCTCTATA 


SEQ ID NO: 627 


MP PAR 


TTGAGTATCCCTTATCCAAAATGGTTGGGACCAGAAGTGTTTCGUAI i lu 


SEQ ID NO: 628 


CHARLIE4 


GTGACTCCACATGTTAATGGTCTTATTCAAGCTAAGCAGCATCTACTATC 


SEQ ID NO: 629 


OMAPI IFQ 


CGTTGCAACGTGCACAGTTCATGCTAAGGATCCGTGCGATGCACTCTGAT 


SEQ ID NO: 630 




NGTCNATTGTTTGACTTTCACACATTCGACTTCCATACACGI 1 i ICAGGA 


■ SEQ ID NO: 631 


MFR^iAl 


TACTGAATCAGAATCTGCGTTTTAACAAGATCCCCAGGTGATTCATATGC 


SEQ ID NO: 632 


t^AMf^A9 A 
l\/-\IN \Dr\C r\ 


TTGGCCANAAAACTTTTNTTGAATCTTCTCATTGGGAAAATTGGGAGATC 


SEQ ID NO: 633 


FORDPRFFF 

CT 


TTCACGTGCACTGATTGGACAATAAACAAATACGTAAGTACCTCI rCTCT 


SEQ ID NO: 634 


FORDPREFE 
CT A 


ACTTAGAAAATTTCGAGGAAGGCACTCCAAAGCACGGGGTCCCCTGAGGC 


SEQ ID NO: 635 


LTR16E 


ACGCATCACCTTGCATTGGTTCCCATCCTTCCCTGCCTCACTTCCCI 1 1 1 


SEQ ID NO: 636 


L1PA17 5 


CGAAGCCAAACGATCATACACAACATAGACCACAGTCATACCCTCAAGGG 


SEQ ID NO: 637 


CHARLIE10 


AGTAGCGCTGTCATCAATGCAACCTAGATTAGATAAGTTAACAAGCAAGA 


SEQ ID NO: 638 


THE1B 


CGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATTAA 


SEQ ID NO: 639 


MSTA 


ATGATTGTAAGTTTCCTGAGGCCTCCCCAGAAGCCGAGCAGATGCCAGCA 


SEQ ID NO: 640 
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MSTC 


ATGCGGCCCCTCGACCTTGGACTTCCCAGCCTCCAGAACTGTAAGAAATA 


SEQ ID NO: 641 


MLT1A 


GCCGTCTACGAACCAGGGAATGAGCCCTGACCAGAAACTGAATCTGCCGG 


SEQ ID NO: 642 


MLT1B 


GGCATCTACAAGCCAAGGAGAGAGGCXTCAGAAGAAACCAACCCTGCCGA 


SEQ ID NO: 643 


MLT1C 


CATGGAACAGATTGTCGCTCACAGCCCTCAGAAGGAACCAACCCTGCCGA 


SEQ ID NO: 644 


MLT1D 


TAGCCCAGTGAGACCCATTTCGGACTTCTGAGCTCCAGAACTGTAAGATA 


SEQ ID NO: 645 


MLT1E 


TTGTGAGACCCTGAAGCAGAGGACCCAGCTAAGCTGTGCCCGGACTCCTG 


SEQ ID NO: 646 


MLT1F 


CATCTTGACTGCAACCTCATGAGAGACCCTGAGCCAGAACCACCCAGCTA 


SEQ ID NO: 647 


MLT2A1 


GTTCTTCAGTTTTGGGACTCGGACTGGCTCTCCTTGCTCCTCAGCTTGCA 


SEQ ID NO: 648 


MLT2B2 


TCACGTGAGGCAATTCCCCTAATAAATCYCYTCTATCCATCCTATTGGTT 


SEQ ID NO: 649 


MLT2C2 


CCACAATCGCGTGAGCCAATTCCTTAAAATAAATCTCTCTCTACACACAC 


SEQ ID NO: 650 


MLT2D 


TCTGCCTGCCTGATNGTCTTCGAAGTGGAATATCAGCTCTGCGGATTTTG 


SEQ ID NO: 651 


MER4A 


TAAAASCAAGCTGTRCCCCGACCACCTTGGGCACATGTCGTCAGGACCTC 


SEQ ID NO: 652 


MER4B 


CTAAAATGTATAAAASCAAGCTGTRCCCCGACCACCTTGGGCACATGTKG 


SEQ ID NO: 653 


MER4C 


ATTGAAGCCCTCAAAATCATCTTTGGAGAAAGGCACAGACCAGAGATGTT 


SEQ ID NO: 654 


MER9 


GCTGTGAGACCCCTGATTTCGCACTTCACACCTGTATATnrCTGTGTGTG 


SEQ ID NO: 655 


MER11A 


CACGGTCCTACCGATATGTGATGTCACCCCYGGAGGCCCAGCTGTAAAAT 


SEQ ID NO: 656 


MER11B 


CCGGATRCCCAGCTTTAAAATTTCTCTCTTTTGTACTCTGTCCCTTTATT 


SEQ ID NO: 657 


MER39 


GGTCTTTGGGTCrrCATTTCTGAAGGCTCCCATGTCACGTAAAACTTTGA 


SEQ ID NO: 658 


MER48 


TGTTGTTGTGGACGCGCTCTCGGGGTTSGAACCGAYACAAGARCCTTACA 


SEQ ID NO: 659 


LOR1 


TCTTCCTTGGCAATAMTYRTTGTCTGAGTGATTGGCTTTCTGTGCAGTGA 


SEQ ID NO: 660 


MER49 


TGCGGGATGGCCACCTTGCAGGCTGTAACCCTTTATAAGAAATAAAGTCT 


SEQ ID NO: 661 


MER39B 


TGCCI rn CTCCWATTAATCTGCCI 1 1 1 GTSAGTTGATTTTTCAGTGAAM 


SEQ ID NO: 662 


MER61 


AAGCCTAAVVTTTTCGTGGCCGTGTGACAAGGAGCCCGTCTTTAGCTGAAC 


SEQ ID NO: 663 


MER31 


CCTGTACCTATCGCAATGGTCCTGAATAAAGTCTGGCTTACCGTGCTTTA 


SEQ ID NO: 664 


MER34* 


GCCGGAAACTCTAAGAGGGTAGAGGWAAAATrTTrCCTTCYCTNCCATGG 


SEQ ID NO: 665 


MER41C 


TTTACACTGTGGAATCACCCTGAATTCTTTCTTGGATGAGATCCAAGAAC 


SEQ ID NO: 666 


MER50 


TGCTCTAAAACTTGCCTCGGTCTCI 1 1 I 1 CTGCCTTATGCCCCTCAGTCG 


SEQ ID NO: 667 


MER65A 


GAATATGCACATAGTTTACTATGGCACGCGTATTCCCATTGCAATGCTCT 


SEQ ID NO: 668 


MER65B 


GTGTATGCCCCAAATTGCAATTCTGTTCTTCACATGTTATTCCCAAATAA 


SEQ ID NO: 669 


MER66A 


AGCGGCTTCAATAAAAGTTGCTGTCTAATACCACCARCTCGCCCTTGAAT 


SEQ ID NO: 670 


MER66B 


AGCCGCTTCAATAAAAGTTGCTGTCTAATACCACCARCTCGCCCTTGAAT 


SEQ ID NO: 671 


MER67A 


ATTCTCCCTTTAAAACGCCCAGTCACCTCTGCACAAATCGAAGCTGAGCT 


SEQ ID NO: 672 


MER67B 


CCTCATrCTCCCTTTAAAACGCCCAGTCACCTCTGCACAAATTGGAATGG 


SEQ ID NO: 673 


MER67C 


TAGCAGATTGCCTGTGATGCGCATCACATTCTGGTTTAATGCTTATTCAA 


SEQ ID NO: 674 


MER68A 


CCTGTGAGTCCTCCTAGCGAATCACCGAACCTGGGGGTGGTCTTGGGAAC 


SEQ ID NO: 675 


MER68B 


TTGGCTTTGCTGATCTrGCCGTGTATCCTTACNRTGTCGCTGTAATAAAT 


SEQ ID NO: 676 


MER70A 


TGTTGTGTCTCACCGGACTCAGAGAAGTTGGTAACCAGTGCAGAGTGAAC 


SEQ ID NO: 677 


MER70B 


TCNGACCCCTATTCCTGGTGGTTGGCATAGTGATGATCTTTGCTATTCTC 


SEQ ID NO: 678 


MER72 


GCTGCAACCGTTTATGAGAAATAAAGCTCTCCTTTCCAAATTTATGAACC 


SEQ ID NO: 679 


MER73 


GGTGACGGGGTACGACTGGGTTTCAAACAACTTATGTCAGGCCTAAAAAT 


SEQ ID NO: 680 


MER74 


AAGGATGATTAATACAAKYTGCTCTGTGATGAACGGATGCCAAATAGWCG 


SEQ ID NO: 681 


MER76 


TGTTGCCTTAATCGGCTNCTCTGACACCCGGCAGCTCAGCTGTCTGTCCA 


SEQ ID NO: 682 


MER77 


CTTCTAGCGAATCACTGAACCTGAGGGTGGTCTTGGGGACCCCCGACACA 


SEQ ID NO: 683 


MLT1G 


GCGTGTTGACTGCGCCGATACCACGTGGGACAGAGAWGAACTRCCCAGCT 


SEQ ID NO: 684 


PABL A 


. AATAAAAACTCTCTTCCTCCCCAGTTCATCTGCATCTCGTTATTGGGCCA 


SEQ ID NO: 685 


PABL B 


CCAGTTCATGTGCATCTCGTTATTGGGCCAGGAGAATAAGCAGCCCGACC 


SEQ ID NO: 686 


MER41D 


ATAAACTTGCTCTTCTGAGTGTACTCCGCAACTCGCCTTGAATTCCTTCC 


SEQ ID NO: 687 


MER51A 


CTCTGCTTTTGTT6CTTCATTC ri I CCTTGCTTTGTTTGTGCGTTTTGTC 


SEQ ID NO: 688 


MER51B 


CTCTGCTTTTGTTGCTTCATTCTTTCCTTGGI \ 1 GTTTGTGCGTTTTGTC 


SEQ ID NO: 689 


MER57A 


ATCTTCTACCACATGGCT6CACTGGAGTCTCTGAACCTACTCTGGTTCTG 


SEQ ID NO: 690 


MER57B 


TATAAATTTGTTCCGACCAC6AGGCATCCCTGGAGTCTGTGTGAATCTGC 


SEQ ID NO: 691 


MER65C 


ACCTCCAACCTTCTCTTTGTTCTTTGGACATACCGAAGAGCACGTGGTCT 


SEQ ID NO: 692 


MER83 


ACAACTGTCTTGGTAAAI lAl 1 1 1 1 ACCTCCCGCGCCACCGGCCCCAGAT 


SEQ ID NO: 693 


MER54 


TGAAAGATACACTGTAAACACCCACAACCAMCTTCCCTGGAGCCCCATCA 


SEQ ID NO: 694 


MER87 


ACTTACTGGCTGTCGWGCGGTGAGGAGTACCAGCTTTeeATTCAGTTACA 


SEQ ID NO: 695 


MER74A 


AATGGCAGTCGTGTCCTGATCTGTTGGCCTTACCATACCTGAATAATAAT 


SEQ ID NO: 696 


MER74B 


CTTTTCAATGGGAGTCGTCTCCTGATCTGTTGGCCTTACCATACCTSAAT 


SEQ ID NO: 697 


MER88 


AGGGGAACTTGTGGCAGGGACCAGCCTTATCACACTG6TGCACCTGGTCA 


SEQ ID NO: 698 


MER54B 


AGCCATTTGGGTGTGGTGTAGAACTGGAAACTGTGTCAAGGGTGACTGAG 


SEQ ID NO: 699 


MER31A 


AAATTCCCACTTGCCCATGCTGTATTCGGAGTTGAGCCCAATCTCTCTCC 


SEQ ID NO: 700 


MER31B 


TCCCCACTTGTCCTTGCTGTATTCGGAGTTGAGCCCAATCTCTCTCCCCT 


SEQ ID NO: 701 


MER67D 


ATCCACCTGCCI M IGl 1 ICAGNGGAGTTGAGTTCAANCTCTAACCCCTA 


SEQ ID NO: 702 


MER11C 


TTGTACTCTGTCCCTTTATTTCTCAAGCCAGCCGACGCTTAGGGAAAATA 


SEQ ID NO: 703 


MER11D 


ACTATCTTGTGTGTGTCTATTATTTCTCAACCTGCCGATCCGCCTAGGAG 


SEQ ID NO: 704 


MER61B 


CGCCCAATAAATTCTGCTCCTCACGCTTCAATGTGTCCGCGWGCCTAATC 


SEQ ID NO: 705 


MER61C 


GKGACAAGAACCCGGGTTTTAGCTGAACTAAGGAGCAAAATYCTGCAWCA 


SEQ ID NO: 708 


MER92A 


GTTCCTGAGGTCGGAGCGTTCTCCCTATTGCAATAGTGTn I I GAATAAA 


SEQ ID NO: 707 


MER92B 


TTCTGCGTGAACTTTGAGATGCTTGCAGATCTTATGGTCAGAGCGTTCTC 


SEQ ID NO: 708 


MER92C 


TATCTACCCCTTCCTATAAAAGTCCAAGGCAAAACCACCCTGCCGAGACA 


SEQ ID NO: 709 


MER93 


CTTCCTCATNCACCYTATAAAAGCCTTTCCTTCAAGCCCCTCCGGCGGAG 


SEQ ID NO: 710 


MLT1H 


cacagatgcatgagggagcccagccgagacgagaagaaccacxx:agctga 


SEQ ID NO: 711 
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MER89 


AAGCTCTGAATAAATAGCCTrrGCTTGTTCTCATTTGGKTGGTCTTCATT 


SEQ ID NO: 712 


MERSO 


CCTCGCTGCARCGAGCAATAAACCCAACTTGTTCAACCACAGGTGTGTTG 


SEQ ID NO: 713 


MLT2A2 


TGTGGGACTTCACCTTGTGATCGTGTGAGTCAATACTCCTTAATAAACTC 


SEQ ID NO: 714 


MLT1I 


GAGCAGAGCCCCAGCCGACCCGCGATGGACATGTAGCATGAGCAAGAAAT 


SEQ ID NO: 715 


MER52B 


GCCACAGAGGTTTCCGGCCAGAAAAGCGACACCCCAAGGATCCCATGACA 


SEQ ID NO: 716 


MER52C 


ACACTAAATAAAGCTCTTCTTCGTCTTCTTCACCCTTCACTTGTGTGCGT 


SEQ ID NO: 717 


MER95 


TTGARGTCTCCCGGTTCGCGARGTGTWCTTTCTCTYATTGTATGCACAAT 


SEQ ID NO: 718 


MLT1J 


ATGGAGCAGAGCTGCCATACCAGCCCTGGACTGCCTACGTCTAGACTTCT 


SEQ ID NO: 719 


MLT1K 


AGCTACCCCTGGACTmCAGTTACGTGAACCAATAAATTCCCTI 1 1 1 IG 


SEQ ID NO: 720 


MER101 


TTCGTnTACACCGAAGGCTGCATCTCCCCGGTTTGCAAACTGTTCACTG 


SEQ ID NO: 721 


MER41E 


TTTCTGACTCATCCTTGAATTCCTTCTCGCGATGGTGTCAAGAGCCTGGA 


SEQ ID NO: 722 


MLT2E 


TCCCCCCTCCAGACCTTCACTTCCCCAGCTCCTCCCACAATTGTATAAGG 


SEQ ID NO: 723 


MLT1E1 


TGATTTCAGCGTTGTGAGACCCTGAGCAGAGGACCCAGCTAAGCCGTGCC 


SEQ ID NO: 724 


MLT1J1 


AGCCACTGTACArri lGGGGI 1 lAI rTGTTACAGCAGCTAGCGTTACCTT 


SEQ ID NO: 725 


MLT1J2 


CCTGAGTCACTACNTGGAGGAGAGCCACCCACACCCGACCAGAACCCNCA 


SEQ ID NO: 726 


MLT1E2 


TTGATTTCGGCCTTGTGAGACCCTGAGCAGAGAACCCAGCCGAGCCCACC 


SEQ ID NO: 727 


MLT1G1 


TGCGCAAATTGCAGATTCGTGAGGAAAATAAATGATTGTTGT1GI 1 ITAA 


SEQ ID NO: 728 


MER110 


CTCAGCTTTGCTTGATCAACAGGTTTTNTTTTCTGGTGGTGI I 1 ITGGGG 


SEQ ID NO: 729 


MER110A 


TGGTGCTCYCCCTTACCACAGTAAGCAATAAACTCAGCTTTGTCTTATCA 


SEQ ID NO: 730 


MLT1F1 


GAGAGACCCTGAGCCAGAACCACCCAGCTAAGCTGCTGCCGAATTCCTGA 


SEQ ID NO: 731 


MER101B 


GGCTGTGTCTCCCTGGI 1 IGCAAACTGTTCACTGGAATAAACTCTCCTCC 


SEQ ID NO: 732 


MLT1G2 


GGCTGCTGTGCCCTGTCCGAATTCCTGACCCACAGAATCCGTGAGCATAA 


SEQ ID NO: 733 


MSTA1 


AGATGCTCGCACCATGCTTTTTGTCCAGCCAGCAGAAYTATGAGCCAAAT 


SEQ ID NO: 734 


MLT1G3 


AGCGTTCAAGTCTTCCCAGCTGAGGCCCCAGACATCATGGAGCAGAGACA 


SEQ ID NO: 735 


MSTA2 


TGCCCTTGAACTTCCCAGCCTGCAGAACCATGAGCTAAATAAACCTCi i i 


SEQ ID NO: 736 


MLT1C1 


GCCTCCAGAGGGAGCATGGCCCTGCTGACACCTTKGATTTCAGCCCAGTG 


SEQ ID NO: 737 


MSTD 


GATGACGCAGCAAGAAGGCCCTGACCAGATGCCGGCNCCWTGATCTTGGA 


SEQ ID NO: 738 


MER51C 


TGTCGCTTTAATAAATTCCTGCn ICGCTGCTTCGTTGC! GIGI 1 ICATT 


SEQ ID NO: 739 


MER21A 


TGGTGTGAGAGCAGAGGAAAAACACGGTTTGAGAGAGTTTTCCCGAAACA 


SEQ ID NO: 740 


MER34B 


TCTGTCTTTTGTTACAGGGGTCTATTCCAACTAAGAACTTATGAGGGTTG 


SEQ ID NO: 741 


MER54A 


TATCTGGATCGACCACATTGAGGAACTGGGAGGAGGCGGAGAACTGGAAA 


SEQ ID NO: 742 


MER74C 


GCCTTTCATCTATCCGAGTGTCANTGTGTTGTGTCCCGCCATCAAAAGAA 


SEQ ID NO: 743 


THE1A 


CTCATTTTCCTCTTGCCGCCGCCATGTAAGAAGTGCCI 1 ICGCCTCCCGC 


SEQ ID NO: 744 


THE1C 


ATGTGAAGAAGGACGTGTTTGCTTCCCCTTCCGCCATGAl HilAAVal I lu 


SEQ ID NO: 745 


MSTB 


ATGATTGrsIAAGCTTCGTGAGGCCTCACCAGAAGCCGAGCAGATGGCGGCG 


SEQ ID NO: 746 


MSTB1 


GCCATGCTTCTTGTACAGCCTGCAGAACCGTGAGCCAAATAAAUUIUI l I 


SEQ ID NO: 747 


MER51E 


CTGTGGAGTGTACTTTCGCTTCAATAAATCTGTGCTTTCGTTACTNCGTT 


SEQ ID NO: 748 


MER41F 


TGGGTGGCACCACAGTTCCGAGAAATCTTCACGTTTTTCCAGGAATCTTC 


SEQ ID NO: 749 


MER65D 


TAAAAGCTTCCCTTTACCCTCCCCTCTTCAGATGCATCTGTGGCTT6CCA 


SEQ ID NO: 750 


MER72B 


TCCrnrTACCCCTCCCTCAAAGTGCTTTGCTCTCAGCTTCTGCCAGAGGC 


SEQ ID NO: 751 


MER34C 


TTGTTACAGGGGTCTGTCCCAGCTAAGAACTATGAAGGGTAGAGAGAAAA 


SEQ ID NO: 752 


MER50B 


GATATGCC6CYGGTAACTCAGGGTAAGTCGGATCTCTTCCACCGGTAACA 


SEQ ID NO: 753 


MER93B 


CTATAAAAGCCTCCCCCTTGCATTCCCTCGGTGGAGCTCCCGAACCACI 1 


SEQ ID NO: 754 


MLT1A1 


CATCTTGGAAGCAGAGASCAGGCCGTCACCAGACACCAAACCTGCTGGNA 


SEQ ID NO: 755 


MLT1E1A 


CTTGTGAGACCCTGAGCAGAGGACCCAGCTAAGCTGTGCCCAGACTCCTG 


SEQ ID NO: 756 


MER4E1 


TCACGGGCCATGGTCACTCATATT 1 GGCTCAGAATAAATCTCTTCAAATA 


SEQ ID NO: 757 


PRIMA4 LTR 


TTTAAATTTGGAGCCCTCAAAATCATCTTCGGAGAAAGGCATAGACCTGT 


SEQ ID NO: 758 


MLT1F2 


ACACCTTGATTGCAGCCTTGTGAGAGACCCTGAGCCAGAAGACCCAACTA 


SEQ ID NO: 759 


MLT2B3 


CTTCTCAGCCTCCATAATCAAGTGAGCCAATTCCCCTAATAAATCGGTTC 


SEQ ID NO: 760 


MER66C 


GAGCAGTACCGTTCAATAAAAGATTGCTGTCTAACACCACTGGCTCACCC 


SEQ ID NO: 761 


MER52D 


GTGAGGCAAAGGHACCACHGGHCACAGAGGI 1 ICrGGCCAGAAAAGBGAC 


SEQ ID NO: 762 


MER41G 


TGCTTTGGAATAAAAGCTTGl IGCGI 1 1 CGCTTCATTCTGACTCATCCCT 


SEQ ID NO: 763 


MER21C 


TGTGGGATCTGATGCTAACTCCAGGGTAGATAGTGTCAGAATTGAATTAA 




MLT2B4 


CCTGGGTGTGCAGCTTGCGAACTGACCCTGCAGATCTTGGGACTTCTCAG 


SEQ ID NO: 765 


MER9B 


TAAATATGTGGGTCAAACrGIGI 1 rGTGGCTCTCAGCTCTGAAGGCTGTT 


SEQ ID NO: 766 


MLT1H2 


TACAGCATGTGGAGCAGAAGAACGACCCAGCTGAGCCCAGCCAACACAGA 


SEQ ID NO: 767 


MER4A1 


AAAACCAAGCTGTGCTCTGACCACCTTGGGCACATGTCGTCAGGACCTCC 


SEQ ID NO: 768 


MER4D1 


■ TCANAGGCCATGGTCACTGATAI 1 1 GGCTCAGAATAAATCTCTTCAAATA 


SEQ ID NO: 769 


THE1D 


" "tGCTTGCTTCCCCTTTGCCTTCTGCCATGATTGTAAG 1 1 1 CO 1 GAGGCCT 


SEQ ID NO: 770 



The expression patterns of the present invention can be evaluated by utiliang high- 
density expression arrays or microarrays. As defined herein, '*microarray" can be a chip, a 
glass slide or a nylon membrane comprising different types of material, such as, but not 
5 limited to, nucleic acids, proteins or tissue sections. By utilizing microarray technology, a 
pluraUty of transposable element sequences from transposable element famihes can be 
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analyzed simultaneoiisly to obtain expression patterns. One of skill in the art can design a 
microairay chip or glass slide that contains the representative nucleic acid sequences of all 
of the members of a particular transposable element family or the nucleic acid sequences of 
select members of a particular transposable element family. An array can also contain the 
nucleic acid sequences of selected transposable elements from one or more families. Array 
design will vary depending on the transposable element families and the sequences from 
these families being analyzed. One of skill in the art will know how to design or select an 
array that contains the transposable element sequences associated with a particular type of 
cancer. Such microarrays can be obtained from commercial sources such as AflEymetrix, or 
the microarrays can be synthesized. Methods for synthesizing such arrays containing 
nucleic acid sequences are known in the art. See, for example, U.S. Patent No. 6,423,552, 
US. Patent No, 6,355,432 and U.S. Patent No. 6,420,169 which are hereby incorporated in 
their entireties by this reference. 

The present invention also provides microarray slides or chips comprising 
transposable element sequences or fragments thereof from transposable element famiUes. 
As stated above, a microarray slide or chip can contain the representative nucleic acid 
sequences of all of the members of one or more transposable element famiUes or the nucleic 
acid sequences of select members of one or more transposable element famiUes. Hie 
present invention also provides for a kit comprising a microarray slide or chip of the present 
invention for diagnosis of cancer, staging of cancer, other clinical appUcations and research 
. applications. UtiUzing the methods of the present invention, a chip(s) or glass slide(s) that 
specifically detect a type of cancer can be synthesized. For example, if it is known that 
transposable element sequences from two families are expressed in prostate cancer, a chip 
that contains the necessary transposable element sequences from these two families can be 
synthesized, such that one of skill in the art can utilize a kit, containing this chip, for 
detecting and staging prostate cancer. Similarly, utilizing the expression patterns of 
transposable element sequences for breast cancer, it is possible to manufacture a kit 
containing a chip comprising the transposable element sequences involved in breast cancer 
in order to diagnose and stage breast cancer. Also, utilizing the expression patterns of 
transposable element sequences for ovarian cancer, it is possible to manufacture a kit 
containing a chip comprising the transposable element sequences involved in ovarian cancer 
la ord^ to diagnose and stage ovarian cancer. 

Microarray techniques would be known to one of skill in the art. For example, U.S. 

Patent No. 6,410,229 and U.S. Patent No. 6,344,316, bothhereby incorporated by this 
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reference, describe methods of monitoring expression by hybridization to high density 
nucleic acid arrays. For example, one skilled in the art would first produce fluorescent- 
labeled cDNAs firom mRNAs isolated firom cancer cells. A mixture of the labeled cDNAs 
from the cancer cells is added to an array of oligonucleotides representing a plurality of 
known transposable elements, as described above, under conditions that result in 
hybridization of the cDNA to complementary-sequence oligonucleotides in the array. The 
array is then examined by fluorescence imder fluorescence excitation conditions in which 
trasosposable element polynucleotides in the array that are hybridized to cDNAs derived 
from the cancer cells can be detected and quantified. 

The expression pattems of the present invention can also be determined by assaying 
for mRNA transcribed from transposable elements, assaying for proteins expressed from a 
mRNA, RT-PCR and northern blotting. Particular protein products translated from mRNAs 
transcribed by transposable element genes can be detected by utilizing 
immunohistochemical techniques, ELIS A, 2-D gels, mass spectrometry. Western blotting, 
and enzyme assays. 

la the present invention, pattems of expression can include one, two, three, four, 

five, six, seven, eight, nine, ten, twenty or more families of transposable elements and at 

least one, two, three, four, five, ten, fifteen, twenty, twenty-five, fifty, one hundred, two 

hundred, three hundred, four hundred, five hundred, one thousand, two thousand, three 

ttiousand, four thousand, five thousand, six thousand, seven thousand, eight thousand, nine 

thousand, ten thousand, twenty thousand, fifty thousand, one hundred thousand, two 

hundred thousand, three hundred thousand, four hundred thousand or five hundred 

thousand members of each transposable element family are being analyzed. For example, 

the present invention provides for the determination of an expression pattern of one family 

of transposable elements in which one, two, three, four, five, ten, fifteen, twenty, twenty 

five, fifty, one hundred, two hundred, three hundred, four hundred, five hundred, one 

thousand, two thousand, three thousand, four thousand, five tihousand, six thousand, seven 

thousand, eight thousand, nine thousand, ten thousand, twenty thousand, fifty thousand, one 

hundred thousand, two hundred thousand, three hundred thousand, four hundred thousand 

or five hundred thousand members of a transposable element family are analyzed. The 

present invention also provides for the determination of an expression pattom of two 

families, wherein one, two, three, four, five, ten, fifteen, twenty, twenty five fifty, one 

hundred, two hundred, three himdred, four hundred, five hundred, one thousand, two 

thousand, three thousand, four thousand, five thousand, six thousand, seven fliousand, eight 
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thoiisand, nine thousand, ten thousand, twenty thousand, fifty thousand, one hundred 
thousand, two hundred thousand, three hundred thousand, four hundred thousand or five 
hundred thousand members are analyzed for each femily. Similarly, the invention provides 
for the detemiination of an expression pattern of three families, wherein one, two, three, 
5 four, five, ten, fifteen, twenty, twenty five fifty, one hundred, two hundred, three hundred, 
four hundred, five hundred, one tihousand, two thousand, three thousand, four thousand, five 
thousand, six thousand, seven fliousand, eight thousand, nine thousand, ten thousand, 
twenty thousand, fifty thousand, one hundred thousand, two hundred thousand, three 
hundred thousand, four hundred thousand or five hundred thousand members are analyzed 
10 for each family. Similarly, the invention provides for the determination of an expression 
pattem of multiple families, for example, 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 
400, 450, 500, 550, 600, 650 or 700 families wherein one, two, three, four, five, ten, fifteen, 
twenty, twenty five fifty, one hundred, two hundred, three hundred, four hundred, five 
hundred, one thousand, two thousand, three thousand, four thousand, five thousand, six 
1 5 thousand, seven thousand, eight thousand, nine thousand, ten thousand, twenty thousand, 
fifty thousand, one hundred thousand, two hundred thousand, three himdred thousand, four 
hundred thousand or five hundred thousand members are analyzed for each family. 

By utilizing the methods of the present invention, a reference expression pattem can 
be obtained for normal tissues or cells, for particular types of cancers as well as for stages of , 
20 particular types of cancers. Therefore, the present invention provides a method of assigning 
an expression pattem of transposable elements to a type of cancerous cell in a sample, 
comprising: a) deteraiining expression of one or more families of transposable elements; 
and assigning the expression pattem obtained firom step a) to the type of cancerous cell in 
the sample. The present invention also provides a method of diagnosing cancer comprising: 
25 a) deterauning expression of one or more families of transposable elements in a sample to 
obtain an e>q)ression pattem; b) matching the expression pattem of step a) with a known 
expression pattem for a type of cancer, and c) diagnosing the type of cancer based on 
matching of the expression pattem of a) with a known expression pattem for a type of 
cancer. 

30 In the methods of the present invention, the expression pattem obtained firom a 

sample taken fi-om a subject can be obtained from outside sources, such as a testing 

laboratory or a commercial source. Therefore, the step of obtaining the expression pattem 

can be performed by one skilled artisan and the step of comparing tiie expression pattem 

can be performed by a second skilled artisan. Thus, tiie presrat invention provides a 
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method of diagnosing cancer comprising: a) matching a test transposable element 
expression pattern with a known expression pattern for a type of cancer; and b) diagnosing 
tihie type of cancer based on matching of the test expression pattern with a known expression 
pattem for a type of cancer. 
5 For example, one of skill in the art can obtain an ovarian tumor cell and determine 

the expression pattem of one or more transposable element families. By determining which 
transposable element families are expressed as well as which members of these transposable 
element families are expressed, one of skill m the art can assign this pattem to an ovarian 
tumor cell. This can be done for an ovarian tumor cell at different stages of cancer, such 
10 that a library of expression patterns are readily available to not only diagnose but stage 

ovarian cancer. Similarly, this can be done for any type of cancer cell, such as a carcinoma 
cell, a fibroma cell, a sarcoma cell, a teratoma cell, a blastoma cell, a breast tumor cell of 
epithelial origin, an ovarian tumor cell of epithelial, stromal or germ cell origin, mixed cell 
types firom a tumor or any other cancer cell. By determining the expression pattems of 
15 transposable elements at different stages of cancer, the skilled artisan can determine which 
transposable element fanuUes and which members of these families are involved in cancer 
and cancer progression. 

Such libraries of expression patterns are usefiil for diagnosis, staging and treatment. 
For example, a sample can be obtained firom a patient or subject in need of diagnosis and 
20 assayed for transposable element expression. Once the expression pattem is determined 
according to the methods of the present invention, this expression pattern can be compared 
to a library of expression pattems to determine the type of cancer as well as the stage of 
cancer associated with the expression pattem. Once this is determined, appropriate 
treatment can be prescribed. In addition to identifying expression pattems for different 
25 stages of cancer, the present methods are also usefiil for identifying expression pattems of 
cancer cells after therapeutic intervention. For example, a sample can be obtained from a 
patient or subject vmdergoing treatment for a cancer such as prostate cancer, lymphoma, 
skin cancer, Gl-tract cancer or any oth^ type of cancer. Expression pattems can be 
obtained and compared to e5q)ression pattems before treatment In this way, the changes in 
30 transposable element expression can be monitored such that one of skill in the art would 
know which transposable element families as well as which members of each family are 
affected by the treatment. If improvement is seen in the patient, these improvements can be 
attributed to changes in transposable element expression. Since tiie skilled artisan will have 

reference pattems for a normal tissue or cell, changes in transposable element expression 
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after treatment can be monitored to determine if the treatment results in a transposable 
element e7q>ression pattern that more closely resembles normal or **baseline" expression 
patterns. Improvements can also be monitored clinically by observing changes in tissue 
health, cellular changes and changes in the subject's overall health. In this way, one of skill 

5 in the art can correlate clmical changes with changes in transposable element expression. 
For cancers such as breast canc^ and ovarian cancer, once a tissue sample is 
obtained from a subject, this tissue sample can be compared to a library of tissue samples 
from many subjects, representing various stages of &e cancerous tumor. By comparing the 
tissue sample to a library of tissue samples with known transposable element expression 

10 patterns, one of skill in the art cm tailor treatment to tihie individual needs of the subject 
For example, if the expression pattem for the subject matches the expression pattern of a 
particular stage of cancer that is amenable to treatment with a chemotherapeutic agent, then 
the subject is a candidate for that treatment. Similarly, one of skill in the art can determine 
the likelihood that the subject will respond to a particular treatment by determining whether 

15 or not the subject's pattem corresponds to patterns obtained for those who have responded 
to treatment In this way, treatments can be personalized to maximize the outcome while 
minimizing unnecessary side effects. The patterns in the libraries utilized for comparison 
puiposes can be grouped by age, medical history or other categories in order to better 
determine the likelihood of response for subjects. In certain cases, the pattem obtained 

20 from the subject may correspond to a pattem for a stage of cancer that does not respond to 
any available treatment In cases such as these, .one of skill in the art may determine that 
treatment may not be advisable because the subject may suffer uimecess;aiily with little or 
no likehhood of success. 

As mentioned above, one of skill in the art will be able to analyze aiid interpret the 

25 differences in expression. For example, if before treatment, certain families and members 
of these families are expressed, and after treatment, fewer famiUes and/or members of these 
families are expressed, it can be said that this particular treatment is effective in reducing 
expression of these transposable elements, such that the treatment is effective in treating the 
cancCT. In some instances, effective treatments may involve decreasing the expression of 

30 certain transposable elements and increasing the expression of others. Therefore, once 

libraries of expression patterns are established from untreated and treated cancer subjects, 

one of skill in the art will know whether or not treatment is effective in a particular subject 

by comparing the expression pattem of a sample from the patioit at different stages of 

treatment, with reference patterns established for the successful treatment of that particular 
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type of cancer. If a treatment is not successful in a particular subject, the skilled artisan will 
recognize this by noting that the expression pattern is not ch a nging as expected, and other 
dosages, tiierapies or treatments can be employed. 

Therefore, the present invention also provides a method of determining the 
5 effectiveness of an anti-cancer therapeutic in a subject comprising: a) determining 

e:q)ression of one or more families of transposable elements, in a san:q>le obtained from the 
subject, to obtain a first expression pattern; b) adnnnistering an anti-cancer therapeutic to 
tihie subject; c) determining expression of one or more families of transposable elements in a 
sample obtained from the subject after administration of an anti-cancer therapeutic to obtain 
10 a second expression pattern; and d) comparing the second expression pattern with the first 
expression pattern such that if the differences between the expression patterns can be 
correlated with successful treatment, the anti-cancer therapeutic is an effective anti-cancer 
therapeutic. The changes observed between expression patterns can vary depending on the 
type of cancer and the stage of cancer. The changes observed can also vary depending on 
15 the size, age, weight and otiier physiological characteristics of the subject. 

In some instances, an effective anti-cancer therapeutic will result in fewer 
transposable elements being expressed in the second expression partem as compared to the 
first expression pattern. Li other instances, there may be more transposable elements 
expressed in the second pattern as compared to tiie first expression pattern. For example, 
20 one of skill in flie art can diagnose a cancer utilizing the methods of the present invention 
and assign a first expression pattern to a sample from a subject The following example is 
not meant to be limiting and the numbering of transposable elements appears for illustrative 
purposes only and not for purposes of identifying any particular retroelement sequences. As 
an example, the first expression pattem comprises the expression of transposable elements 
25 1,3, 5, 7, 9 from transposable element family A, the expression of transposable elements 
23, 56 and 78 from transposable elemrat family B and the expression of transposable 
elements 10, 15, 25 from transposable element family C. After administration of an anti- 
cancer therapeutic, a second expression pattem is obtained. The second expression pattem 
comprises, for example, the expression of transposable elements 3, 5, 9 from family A, the 
30 expression of transposable element 23 from family B and the expression of transposable 
elementl5 fix>m transposable element family C. The dolled artisan, upon comparing the 
patterns, will determine that the anti-cancer therapeutic is effective in reducing the 
expression of transposable elements 1 and 7 from family A, transposable elements 56 and 

78 &om family B, and transposable elementslO and 25 &om transposable element family C. 
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The skilled artisan can continue to monitor changes throughout treatment in order to 
determine which transposable elements are suppressed or expressed as treatment progresses. 
One of skill in the art can also compare the expression pattem obtained after treatment to 
the expression pattem of a normal, non-cancerous cell to detemiine how the treatment is 

5 progressing. If the expression pattem after treatment resembles the expression pattem of a 
normal cell, the treatment can be said to be successfiil, however, the expression pattem need 
not be exactly like the expression pattem of a normal cell in order to deem a treatment 
effective. In effect, if the changes in transposable element expression after treatment are 
indicative of progression toward flie e3q)ression pattem of a normal cell, flie treatment can 

10 be said to be successful. 

Analysis of Methvlation Patterns 

The present invention also provides methods of assessing the methylation status of 
transposable element sequences and its role in cancer development and progression. Thus, 

15 the present invention also provides methods for the determination of methylation patterns of 
transposable element sequences. By analyzing global methylation patterns of transposable 
element sequences and transposable element families, one of skill in the art can assign 
particular transposable element methylation patterns to types of cancer. Such methylation 
patterns can be used to diagnose, classify and stage cancer. These transposable element 

20 methylation patterns can be used in combination with transposable client expression 
patterns described herein to diagnose, classify and stage cancer. 

Also provided by the present invention is a method of determining a methylation 
pattem of one or more families of transposable elements genes in a sample comprising 
deteruMning methylation of one or more families of transposable elements. 

25 In the present invention, methylation patterns can include one, two, three, four, five, 

six, seven, eight, nine, ten, twenty or more families of transposable elements and at least 
one, two, three, four, five, ten, fifteen, twenty, twenty-five, fifty, one himdred, two himdred, 
three hundred, four hundred, five hundred memb^, one thousand, two thousand, three 
tiiousand, four thousand, five thousand, six thousand, seven thousand, eight thousand, nine 

30 thousand, ten thousand, twenty thousand, fiifty thousand, one hundred thousand, two 

hundred thousand, three hundred thousand, four hundred fliousand or five hundred thousand 
members of each transposable element family. For example, the present invention provides 
for the detemMnation of a methylation pattem of one family of transposable elements in 
which one, two, three, four, five, ten, fifteen, twenty, twenty five, fifty, one hundred, two 



wo 2004/096021 PCT/US2004/013522 

hundred, three hundred, four hundred, five hundred members, one thousand, two thousand, 
three thousand, four thousand, five thousand, six thousand, seven thousand, eight thousand, 
nine thousand, ten thousand, twenty thousand, fifty thousand, one hundred thousand, two 
hundred thousand, three hundred thousand, four hundred thousand or five hundred thousand 
members of the transposable element family are analyzed. The present invention also 
provides for the determination of a methylation pattern of two fanoilies, wherein one, two, 
three, four, five, ten, fifteen, twenty, twenty five, fifty, one hundred, two hundred, three 
hundred, four hundred, five hundred members, one thousand, two thousand, three thousand, 
four thousand, five fiiousand, six thousand, seven thousand, ei^t thousand, nine thousand, 
ten thousand, twenty thousand, fifty thousand, one hundred thousand, two hundred 
thousand, three hundred thousand, four himdred thousand or five hundred tihiousand 
members are analyzed for each family. Similarly, the invention provides for the 
detemxination of a methylation pattem of three families, wherein one, two, three, four, five, 
ten, fifteen, twenty, twenty five fifty, one himdred, two hundred, three hundred, four 
hundred, five hundred members, one thousand, two thousand, three thousand, four 
thousand, five thousand, six thousand, seven thousand, eight thousand, nine thousand, ten 
thousand, twenty thousand, fifty thousand, one hundred thousand, two hundred thousand, 
three hundred thousand, four hundred thousand or five hundred thousand memb^ are 
analyzed for each family. Similarly, the invention provides for the determination of a 
methylation pattem of multiple famiUes, for example, 10, 20, 30, 40, 50, 100, 150, 200, 250, 
300, 350, 400, 450, 500, 550, 600, 650 or 700 families wherein one, two, three, four, five, 
ten, fifteen, twenty, twenty five fifty, one hundred, two hundred, three hundred, four 
hundred, five hundred, one thousand, two thousand, three thousand, four thousand, five 
thousand, six thousand, seven thousand, eight thoxisand, nine thousand, ten thousand, 
twenty thousand, fifty thousand, one hundred thousand, two hundred thousand, three 
hundred thousand, four hundred thousand or five hundred thousand members are analyzed 
for each family. 

By utilizing the methods of the present invention, a reference methylation pattem 
can be obtained for normal tissues or cells, for particular types of cancers as well as for 
stages of particular types of cancers. Therefore, the present invention provides a method of 
assigning a methylation pattem of transposable elements to a type of cancerous cell in a 
sample, comprising: determining the methylation pattem of one or more families of 
transposable elements; and assigning the methylation pattem obtained from step a) to the 
type of cancerous cell in the sanq)le. 
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The present invention also provides a method of diagnosing cancer conoiprising: a) 
determining the methylation pattern of one or more families of transposable elements in a 
sample to obtain a methylation pattern; b) matching the methylation pattem of step a) with a 
known methylation pattem for a type of cancer; and c) diagnosing the type of cancer based 
on matching of the methylation pattem of a) with a known methylation pattem for a type of 
cancer. 

In the methods of the present invention, the methylation pattem obtained from a 
sample taken from a subject can be obtained from outside sources, such as a testing 
laboratory or a commercial source. Therefore, tiie step of obtaining the methylation pattem 
can be performed by one skilled artisan and the step of comparing the methylation pattem 
can be performed by a second skilled artisan. Thus, the present invention provides a 
method of diagnosing cancer comprising: a) matching a test transposable element 
methylation pattem with a known methylation pattem for a type of cancer; and b) 
diagnosing the type of cancer based on matching of the test methylation pattem with a 
known methylation pattem for a type of cancer. 

For example, one of skill in the art can obtain an ovarian canc^ sample and 
determine the mediylation pattOTi of one or more transposable element femilies. By 
detemiining which transposable element families are methylated as well as which members 
of these tranq)osable element families are methylated, one of skill in the art can assign this 
methylation pattem to an ovarian cancer sample. This can be done for ovarian cancer 
samples at diflferoit stages of cancer, such ttiat a library of methylation patterns are readily ; 
available to not only diagnose but stage ovarian cancer. Similarly, this can be done for any 
type of cancer cell, such as a carcinoma cell, a fibroma cell, a sarcoma cell, a teratoma cell, 
a blastoma cell, a breast tumor cell of epithelial origin, an ovarian tumor cell of epithelial, 
stromal or germ cell origin, mixed cell types from a tumor or any other cancer cell. By 
determining the methylation patterns of transposable elements at different stages of cancer, 
the skilled artisan can determine which transposable element families and which members 
of these families are involved in cancer and cancer progression based on changes in DNA 
methylation (and/or chromatin stracture). 

Such libraries of expression patterns are useful for diagnosis, staging and treatment. 

For example, a sample can be obtained from a patient or subject in need of diagnosis and 

assayed for transposable element methylation. Once flae methylation pattem is determined 

according to the methods of the present invmtion, this methylation pattem can be compared 

to a library of methylation patterns to determine the type of cancer as well as the stage of 
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cancer associated with the methylation pattern. Once this is detennined, appropriate 
treatment can be prescribed. In addition to identifymg methylation patterns for different 
stages of cancer, the present methods, are also useful for identifying methylation patterns of 
cancer cells after therapeutic intervention. For example, a san[q>le can be obtained from a 

5 patient or subject undergoing treatment for a cancer such as prostate cancer, lymphoma, 
sldn cancer, Gl-tract cancer or any other type of cancer. Methylation patterns can be 
obtained and compared to methylation patterns before treatment In this way, the changes 
in transposable element methylation can be monitored such that one of skill in the art would 
know which transposable element families as well as which members of each family are 

10 affected by the treatment. If improvement is seen in the patient, these improvements can be 
attributed to changes in transposable element methylation. Since the skilled artisan will 
have reference pattems for a normal tissue or cell, changes in transposable element 
methylation after treatment can be monitored to determine if the treatment results in a 
transposable element methylation pattern that more closely resembles normal or 'Tjaseline" 

1 5 methylation pattems. Improvements can also be monitored clinically by observing changes 
in tissue health, cellular changes and changes in the subject's overall health. In this way, 
one of skill in the art can correlate clinical changes with changes in transposable element 
methylation. 

For cancers such as breast cancer and ovarian cancer, once a tissue sample is 

20 obtained from a subject, this tissue sample can be compared to a library of tissue samples 

from many subjects, representing various stages of the cancerous tumor. By comparing the 

tissue sample to a library of tissue samples with known transposable element methylation 

pattems, one of skill in the art can tailor treatment to the individual needs of the subject 

For example, if the methylation pattern for the subject matches the methylation pattem of a 

25 particular stage of cancer that is amenable to treatment with a chemotherapeutic agent, then 

the subject is a candidate for that treatment. Similarly, one of skill in the art can determine 

the likelihood that the subject will respond to a particular treatment by determining whether 

or not the subject's pattem corresponds to pattems obtained for those who have responded 

to treatment In this way, treatments can be personalized to maximize the outcome while 

30 minitnizing unnecessary side effects. The pattems m the libraries utilized for comparison 

purposes can be grouped by age, medical history or other categories in order to better 

determine the likelihood of response for subjects. In certain cases, the pattem obtained 

from the subject may correspond to a pattem for a stage of cancer ttiat does not respond to 

any available treatment. In cases, such as these, one of skill in the art may determine that 
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treatment may not be advisable because the subject may suffer mmecessarily with little or 
no likelihood of success. 

One of skill in the art will be able to assess the dififerences in methylation. For 
example, if before treatment, certain families and members of these families are methylated, 
and after treatment, more families and/or members of these families are methylated, it can 
be said that this particular treatment is effective in suppressing transposable element 
methylation such that tiie treatment is effective in treating the cancer. Li some instances, 
effective treatments may involve decreasing the methylation of certain tran^osable 
elements and increasing the methylation of others. Therefore, once libraries of methylation 
patterns are established from untreated and treated cancer subjects, one of skill in the art 
will know whether or not treatment is effective in a particular subject by comparing the 
methylation pattern of a sample from the patient at different stages of treatment, with 
reference patterns established for the successful treatment of that particular type of cancer. 
If a treatment is not successful in a particular subject, the skilled artisan will recognize this 
by noting that the methylation pattern is not changing as expected, i.e., the methylation 
pattern is not changing such that the methylation pattem more closely resembles the 
methylation pattem of a noncancerous or successfully treated cancer cell, and other dosages, 
therapies or treatments can be employed. 

Therefore, the present invention also provides a method of determining the 
effectiveness of an anti-cancer therapeutic in a subject comprising: a) determining the 
methylation pattem of one or more families of transposable elements, in a sample obtained , 
from the subject, to obtain a first methylation pattem; b) administering an anti-cancer 
therapeutic to the subject; c) determining the methylation pattem of one or more families of 
transposable elements in a sample obtained from the subject after administration of an anti- 
cancer therapeutic to obtain a second methylation pattem; and d) comparing the second 
methylation pattem with the first methylation pattem such that if the differences between 
the methylation patterns can be correlated with successful treatment, the anti-cancer 
therapeutic is an effective anti-cancer thersq)eutic. The changes observed between 
methylation patterns can vary depending on the type of cancer and the stage of cancer. The 
changes in methylation patterns can also vary based on the size, age, weight and other 
physiological charactCTstics of the subject. 

In some instances, an effective anti-cancer therapeutic will result in fewer 
transposable elements being methylated in the second methylation pattem as compared to 

flie first methylation pattem. In other instances, there may be more transposable elements 
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methylated in the second pattern as compared to the first methylation pattern. For example, 
one of skill in the art can diagnose a cancer utilizing the methods of the present invention 
and assign a first methylation pattern to a sample fi:om a subject. The following example is 
not meant to be limiting and flie numbering of transposable elements appears for illustrative 
purposes only and not for purposes of identifying any particular retroelement sequences. As 
an example, this first methylation pattern comprises the methylation of transposable 
elCTients 2, 4, 6, 8 and 10 firom transposable element family A, the methylation of 
transposable elements 24, 57 and 79 fix>m transposable element family B and the 
methylation of transposable elements 11,16, and 26 firom transposable element fiEonily C. 
After adn[iinistration of an anti-cancer therapeutic, a second methylation pattem is obtained. 
The second expression pattem comprises, for example, the methylation of transposable 
elements 2, 4, 6, 8, 10, 12 and 14 firom family A, the methylation of transposable element 
24, 57, 79 and 80 from fanodly B and the methylation of transposable elements 11, 16, 26 
and 32 firom transposable element family C. The skilled artisan, upon comparing the 
patterns, will determine that the anti-cancer therapeutic results in the methylation of 
transposable elements 12 and 14 fix)m family A, transposable element 80 &om family B, 
and transposable element 32 fi»m transposable element fimaily C. This second methylation 
pattem can be compared to the methylation pattem of a normal cell to see if the treatment is 
progressing toward a methylation pattem associated with a non-cancerous cell. This second 
methylation pattem can also be compared to methylation patterns for different stages of the 
particular cancar being treated in order to determine if this pattem corresponds to an 
improvement or a deterioration in the subject's condition. The skilled artisan can continue 
to monitor changes throughout treatment in order to determine which transposable elements 
are methylated or non-methylated, and whettier or not an improvement can be correlated to 
changes in methylation, as treatment progresses. 

As stated above, the inethylation state of non-cancerous cells can serve as a guide to 
one of skill in the art in determining the effectiveness of a treatment. One of skill in the art 
can compare the methylation pattem obtained afl:er treatment to the methylation pattem of a 
normal, non-canceious cell to determine how the treatment is progressing. If the 
methylation pattem afl:er treatment resembles the methylation pattem of a normal cell, the 
treatment can be said to be successfiil, however, the methylation pattem need not be exactly 
like the methylation pattem of a normal cell in order to deem a treatment effective. In other 
.words, if the changes in transposable element sequence methylation after treatment are 
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indicative of progression toward the methylation patten of a normal cell, the treatment can 
be said to be successful. 

The methylation patterns of the present invention can be correlated to transposable 
element expression patterns and/or chromatin status patterns described herein, such that one 
5 of skill in the art, upon obtaining a particular expression pattem and/or a chromatin status 
pattern, will also know what the methylation status of the sample is. Also, upon obtaining 
Mpon obtaining a particular methylation pattern, one of skill in tibe art will also know the 
expression pattem and/or chromatin status of the sample. 

Methods of measuring methylation are known in the art and include, but are not 
10 limited to methylation-specific PGR, methylation microarray analysis and ChIP (a 

chromatin immunoprecipitation approach) analysis. Methylation can also be monitored by 
digestion of nucleic acid sequences with methylation sensitive and non-sensitive restriction 
enzymes followed by Southern blotting or PGR analysis of the restriction products (See 
Takai et al. "Hypomethylation of LINEl retrotransposon in human hepatocellular 
15 carcinomas, but not in surrounding liver cirrhosis" Jpn J. Clin, OncoL 30(7) 306-309). One 
of skill in the art could also utilize methods in which genomic DNA is digested followed by 
PGR. (See, for exan5>le, Cartwright et al., "Analysis of Drosophila chromatin structure in 
vivo" Methods in Enzymology, Vol. 304) 

Methylation-specific PGR (MSP) technology utilizes the fact that DNA in humans is 
20 methylated mainly at certain cytosmes located 5' to guanosine. This occurs especially in 

GG-rich regions, known as GpG islands. To distinguish the methylation state of a sequence, 
MSP relies on differential chemical modification of cytosine residues in DNA. Treatment 
with sodium bisulfite converts unmethylated cytosine residues into uracil, leaving the 
methylated cytosines unchanged. This modification thus creates different DNA sequences 
25 for methylated and unmethylated DNA. PGR primers can then be designed so as to 

distinguish between these different sequ^ces. Two sets of primers (and additional control 
sets of primers) are designed: one set with sequences annealing to unchanged (methylated 
in the genomic DNA) cytosines and the other set witii sequences annealing to the altered 
(uimiethylated in the genomic DNA) cytosines. A comparison of PGR results using the two 
30 sets of primers reveals the methylation state of a PGR product. If the primer set with the 
altered sequence gives a PGR product, then the indicated cytosine was unmetiiylated. If tihie 
primer set with the unchanged sequence gives a PGR product, thra the cytosines were 
methylated and thus protected Scorn alteration. Evron et al. C*Detection of breast cancer 
cells in ductal lavage fluid by methylation-specific PGR," Lancet 2001, 357: 1335-1336) 
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describes the use of MSP to detect breast cancer and is hereby incorporated in its entirety by 
this reference. 

To use a microarray to study transposable element methylation, one of skill in the art 
would select for methylated and unmethylated DNA from total genomic DNA. The 
selectively isolated DNA is then hybridized to the transposable element array either directly 
or after amplification and patterns between various cell types / tissue types as described 
earlier in the patent application. 

There are several approaches for selecting methylated DNA. One method is 
chromatin immunoprecipitation (ChIP ). Another method utilizes a column binding 
approach and a third method involves ligation of adapters to fragmented genomic DNA and 
methylation-specific restriction digestion of the ligation products followed by PGR 
amplification. 

In all cases, the selected DNA fragments are labeled by incorporation of dNTPs 
coupled with fluorescent dyes (for example Cy3 or Cy5 coupled dNTPs) and hybridization 
to tiie microarray is performed according to standard protocols. One of skill in the art could 
utilize the BioPrime DNA labeling system &om Life Technologies or other kits available 
for such labeling. 

As stated above, microarray techniques would be known to one of skill in the art. 
For example, U.S. Patent No. 6,410,229 andU,S. Patent No. 6,344,316, both hereby, 
incorporated by this reference, describe methods of hybridizing nucleic acids to high density 
nucleic acid arrays. For example, one skilled in the art would first produce fluorescent- 
labeled DNA isolated from the tissue of mterest A batch of labeled genomic/ampUfied 
genomic DNAs representing either one sample or a mixture of two samples fix>m the tissue 
sources of interest is added to an array of oligonucleotides representing a plurality of known 
transposable elements, as described above, under conditions that result in hybridization of 
the DNAs to complementary-sequence oligonucleotides in the array. The array is then 
examined by fluorescence under fluorescence excitation conditions in which tran^osable 
element oligonucleotides in the array that are hybridized to genomic/amplified genomic 
DNAs derived &om the tissue of interest can be detected and quantified. 

CMP technology involves in vivo formaldehyde cross-liiddng of DNA and 

associated proteins in intact cells, followed by selective inmaunoprecipitation of protein- 

DNA complexes wito specific antibodies. Such an sq^proach allows detection of any protein 

at its in vivo binding site dfrectly. In particular, proteins that are not bound directiy to DNA 

or that depend on other proteins for binding activity in vivo can be analyzed by this method. 
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Since methylation involves methylation cx)mplexes that involve numerous proteins which 
interact with DNA, by utilizing ChIP technology, methylation complexes can be cross- 
linked to transposable element sequences to which they are boimd and then an antibody 
specific to one of the proteins (i.e, one of the proteins involved in the methylation complex, 

5 such as methyltransferase or a protein having a mettiyl binding site, for example, MBDl) 
can be utilized to inmiunoprecipitate the metiiylation coniplex-DNA bound sequence. The 
complex can then be chemically released and the transposable element sequence to which it 
was bound can be identified. For references describing ChIP technology, see Orlando 
("Mapping chromosomal proteins in vivo by formaldehyde crosslinked-chromatin 

10 immunoprecipitation," JWS 2000, 25:99-104) and Kuo et al. ("/n Vivo Cross-Linking and 
Immunoprecipitation for Studying Dynamic Protein:DNA Associations in a Chromatin 
Environment," 1999, 19: 425-433) both of which are incorporated in their entireties by this 
reference. 

The column binding sqpproach is used to select for methylated DNA after genomic 
1 5 DNA extraction. The column contains methyl-CpG-binding proteins, for example the 

methyl-binding domain of rat MeCP2, coyalently linked to a histidine tag, then attached to a 
Ni-agarose matrix. Fragmented genomic DNA (digested with restriction enzymes, for 
example Msel) is run through the column. The column retains DNA containing methylated 
cytosines, unmethylated DNA is collected from the flow-through. Retained methylated 
20 DNA is recovered from the column. (Cross, S.H., Charlton, J.A., Nan, X. and Bird, A.P. 

(1994) Purification of CpG islands using a methylated DNA binding colimm. Nat Genet, 6, 
236-244 and Brock, Huang, Chen and Johnson (2001) A novel technique for the 
identification of CpG islands exhibiting altered methylation patterns (ICEAMP). Nucleic 
Acids Research, voL29, no.24). The isolated DNA can be ligated to linker oligonucleotides 
25 and amplified by PCR. Fluorescence labeling and hybridization is then performed as 
described above. 

Formaldehyde crosslinking followed by chromatin immunoprecipitation is reviewed 
in Orlando 2000. By addition of formaldehyde to live tissue/cells, DNA and nearby proteins 
are cross-linked in vivo, followed by sonication of the tissue/cell suspension. The DNA is 
30 fragmented in the process. Antibodies recognizing methyl-binding proteins are added and 
ttie immune complexes are collected, tiiereby precipitating methylated DNA with associated 
proteins. DNA without methyl-binding proteins will be collected from the supernatant. The 
cross-linking step is then reversed for both fractions, followed by a DNA purification step. 
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The isolated DNA can be ligated to linker oligonucleotides and amplified by PGR- 
Fluorescence labeling and hybridization is then performed as described above. 

Linker Ugation/Methylation-specific restriction/ PGR can also be utilized- The 
methods of the present invention can utilize a modified version of DMH (Differential 
5 Methylation Hybridization) (References: Huang et al. 'Methylation profiling of CpG islands 
in human breast cancer cells' Human Molecular Genetics 1999, VoL8, No.3 and 
Yan et al. 'Dissecting complex epigenetic alterations in breast cancer using GpG island 
microarrays' Cancer Research 2001, 61, 8375-8380). Genomic DNA is digested with 
MseL Then, the ends of the resulting jfragments are ligated to linker oligonucleotides. 

10 Ligated fragments undergo restriction digestion with methylation-sensitive enzymes BstUI 
and/or Hpally followed by PGR amplification of undigested fragments. Fluorescence 
labeling aad hj^jridization is then performed as described above. 

A GOT-1 subtractive hybridization step can be utihzed at some point before labeling 
the DNA to separate out the highly repetitive sequences from the sample (See Graig et al. ' 

15 Removal of repetitive sequences from FISH probes using PGR-assisted affinity 
chromatography' Human Genetics 1997, Vol. 100, 472-476). 

Anotiier technique, methylation-specific oligonucleotide (MSO) microarray, uses 
bisulfite-modified DNA as a template for PGR amplification, resulting in conversion of 
unmethylated cytosine, but not methylated cytosine, iato thymine within GpG islands of 

20 int^st. The amplified product, therefore, may contain a pool of DNA firagments with 
altered nucleotide sequences due to differential methylation status. A test sample is 
hybridized to a set of olignonucleotide arrays that discriminate between methylated and 
immethylated cytosine at specific nucleotide positions, and quantitative differences in 
hybridization are detemiined by fluorescence analysis. For examples of methylation 

25 microarray techniques see Gitan et al. (^^Methylation-specific oligonucleotide microarray: a 
new potential for high-throughput methylation analysis," Genome Res. 2002, 12: 158-164.), 
Shi et al. ("Oligonucleotide-based microarray for DNA methylation analysis: Principles and 
appUcations," CellBiochem. 2003, 88: 138-143.), Yan et al. ("Applications of Q)G island 
microairays for hi^-tbroughput analysis of DNA methylation," J. Nutr. 2002, 132: 2430S- 

30 2434S), Wei et al. ('^Methylation naicroarray analysis of late-stage ovarian carcmomas 
distinguishes progression-fiee survival in patients and identifies candidate epigenetic 
markers," Clin Cancer Res. 2002, 8: 2246-2252.), all of which are incorporated herem, in 
their entireties, by tiiis reference, 
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Analysis of Cairomatin Status 

The preset invention also provides methods of assessing the chromatin status of 
transposable element sequences and its role in cancer development and progression. Thus, 
the present invention also provides methods for the determination of chromatin status 
5 patterns of transposable element sequences. By analyzing global chromatin status pattems 
of transposable element sequences and transposable element families, one of skill m the art 
can assign particular transposable element chromatin status pattems to types of cancer. 
Such chromatin status pattems can be used to diagnose, classify and stage cancer. These 
transposable element chromatin status pattems can be used in combination vnth 
10 transposable element expression patterns and/or methylation pattems described herein to 
diagnose, classify and stage cancer. 

One of the skill in the art would know how to assess chromatin status by methods 
standard in the art. See Orlando ('^Mapping chromosomal proteins in vivo by formaldehyde 
crossUnked-chromatin immunoprecipitation," TIBS 2000, 25:99-104) and Kuo et al. ("/w 
15 Vivo Cross-Linkmg and Immunoprecipitation for Studying Dynamic Protein:DNA 
Associations in a Chromatin Environment," 1999, 19: 425-433) botii of which are 
incorporated in their entireties by this reference. 

As utilized herein, "chromatin status" refers to the chromosomal stracture or the 
chromosomal accessibility or the ability of restriction enzymes to access a transposable 
20 element sequence or a fragment thereof Therefore, chromatin status patterns can contain 
sequences that are accessible to restriction enzymes and sequences that are not accessible Jo 
restriction enzymes. 

Also provided by the present invention is a method of determining a chromatin 
status pattern of one or more families of transposable element genes in a sample comprising 
25 determining chromatin status of one or more families of transposable elements. 

In the present invention, chromatin status pattems can include one, two, three, four, 
five, six, seven, eight, nine, ten, twenty or more families of transposable elements and at 
least one, two, three, four, five, ten, fifteen, twenty, twenty-five, fifty, one hundred, two 
hundred, three hundred, four hundred, five hundred members, one thousand, two thousand, 
30 three thousand, four thousand, five tiiousand, six tiiousand, seven thousand, eight thousand, 
nine thousand, ten thousand, twenty thousand, fifty tiiousand, one hundred thousand, two 
hundred thousand, three hundred thousand, four hundred fliousand or five hundred thousand 
members of each transposable element family. For exanople, the present invention provides 
for the deteraiination of a chromatin status pattern of one family of transposable elements in 
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which one, two, three, four, five, ten, fifteen, twenty, twenty five fifty, one hundred, two 
hundred, three hundred, four hundred, five hundred members, one thousand, two thoxisand, 
tibree thousand, four thousand, five thousand, six thousand, seven thousand, eight thousand, 
nine thousand, ten thousand, twenty thousand, fifty tiiousand, one hundred thousand, two 
5 hundred thousand, three himdred thousand, four hundred thousand or five hundred thousand 
members of tiie transposable element family are analyzed. The present invention also 
provides for the determination of a chromatin status pattern of two families, wherein one, 
two, three, four, five, ten, fifteen, twenty, twenty five fifty, one hundred, two hundred, three 
hundred, four hundred, five hundred members, one thousand, two thousand, three thoxisand, 
10 four thousand, five thousand, six thousand, seven thousand, eight thousand, nine thousand, 
ten thousand, twenty thousand, fifty thousand, one hundred thousand, two hundred 
thousand, three himdred thousand, four hundred thousand or five hxmdred thousand 
members are analyzed for each family. Similarly, the invention provides for the 
determination of a chromatin status pattern of three famiUes, wherein one, two, three, four, 
15 five, ten, fifteen, twenty, twenty five fifty, one hundred, two hundred, three hundre4 four 
hundred, five hundred members, one thousand, two thousand, three thousand, four 
thousand, five thousand, six thousand, seven thousand, ei^t thousand, nine thousand, ten , 
thousand, twenty thousand, fifty thoxisand, one hundred thousand, two hundred thoiisand, 
three hundred thousand, four hundred thousand or five hundred thousand members are 
20 analyzed for each family. Similarly, the invention provides for the determination of a 

chromatin status pattem of multiple families, for example, 10, 20, 30, 40, 50, 100, 150, 200, 
250, 300, 350, 400, 450, 500, 550, 600, 650 or 700 families wherein one, two, three, four, 
five, ten, fifteen, twenty, twenty five fifty, one hundred, two hundred, three hundred, four 
hundred, five hundred, one thousand, two thousand, three thousand, four thousand, five 
25 thousand, six thousand, seven thousand, eight thousand, nine thousand, ten thousand, 
tWCTty thousand, fifty thousand, one hundred thousand, two hundred thousand, three 
hundred thousand, four hundred thousand or five hundred thousand members are analyzed 
for each family. 

By utilizing the methods of the present invention, a reference chromatin status 
30 pattem can be obtained for normal tissues or cells, for particular types of cancers as well as 
for stages of particular types of cancers. Therefore, the present invention provides a method 
of assigning a chromatin status patten of transposable elements to a type of cancerous cell 
in a sample, comprising: determining the chromatin status pattem of one or more families of 



36 



wo 2004/096021 PCT/US2004/013522 

transposable elements; and assigning the chromatin status pattem obtained from step a) to 

the type of cancerous cell in the sample. 

The present invention also provides a method of diagnosing cancer comprising: a) 

determining the chromatin status patt^ of one or more £unilies of transposable elements in 

S a sample to obtain a chromatin status pattem; b) matching the chromatin status pattem of 

step a) wifli a known chromatin status pattem for a type of cancer; and c) diagnosing the 

type of cancer based on noiatching of the chromatin status pattem of a) witii a known 

chromatin status pattem for a type of cancer. 

In the methods of the present invention, the chromatin status pattem obtained from a 

10 sample taken from a subject can be obtained from outside sources, such as a testing 

laboratory or a commercial source. Therefore, the step of obtaining the chromatin status 

pattem can be performed by one skilled artisan and the step of comparing the chromatin 

status pattem can be performed by a second skilled artisan. Thus, the present invention 

provides a method of diagnosing cancer comprising: a) matching a test transposable 

1 5 element chromatin status pattem with a known chromatin status pattem for a type of cancer; 

and b) diagnosing the type of cancer based on matching of the test chromatin status pattem 

with a known chroniatin status pattem for a type of cancer. 

For example, one of skill in the art can obtain an ovarian cancer sample and 

determine the chroniatin status pattem of one or more transposable element families. By 

20 detCTnining the chromosomal accessibility of transposable element famiUes as well as the 

chromosomal accessibility of members of these transposable element famiUes, one of skill 

in the art can assign this chromatin status pattem to an ovarian cancer sample. This can be 

done for ovarian cancer samples at dififerent stages of cancer, such that a library of 

chromatin status patterns are readily available to not only diagnose but stage ovarian cancer. 

25 Similarly, this can be done for any type of cancer cell, such as a carcinoma cell, a fibroma 

. cell, a sarcoma cell, a teratoma cell, a blastema cell, a breast tumor cell of epithelial origin, 

an ovarian tumor cell of epithelial, stromal or germ cell origin, mixed cell types from a 

tumor or any ofbesr cancer cell. By determining the chromatin status patterns of 

transposable elements at different stages of cancer, the skilled artisan can detemiine which 

30 transposable element fanulies and which members of these families are involved in cancer 

and cancer progression based on changes in chromatin structure. 

Such Ubraries of expression patterns are useful for diagnosis, staging and treatment 

For example, a sample can be obtained from a patient or subject in need of diagnosis and 

assayed for chromatin status. Once the chromatin status pattem is det^mined according to 
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the methods of the present invention, this chromatin status pattern can be compared to a 
library of chromatin status patterns to determine the type of cancer as well as the stage of 
cancer associated with tiie chromatin pattem. Once this is determined, appropriate 
treatment can be prescribed. In addition to identifying chromatin status patterns for 

5 different stages of cancer, the present methods are also useful for identifying chromatin 
status patterns of cancer cells after ther^eutic intervention. For example, a sample can be 
obtained from a patient or subject undergoing treatment for a cancer such as prostate cancer, 
lymphoma, skin cancer, Gl-tract cancer or any other type of cancer. Chromatin status 
patterns can be obtained and compared to chromatin status patterns before treatment. In this 

10 way, the changes in transposable element chromatin status can be monitored such that one 
of skill in the art would know which transposable element families as well as which 
members of each family are affected by the treatment. If improvement is seen in the 
patient, these improvements can be attributed to changes in transposable element chromatin 
status. Since the skilled artisan will have reference pattems for a normal tissue or cell, 

1 5 changes in transposable element chromatin status after treatment can be monitored to 
determine if the treatment results in a transposable element chromatin status pattem that 
more closely resembles normal or **baseline" chromatin status pattems. Improvements can 
also be monitored clinically by observing changes in tissue health, cellular changes and 
changes in the subject's overall health. In this way, one of skill in the art can correlate 

20 clinical changes with changes in transposable element chromatin status. 

For cancers such as breast cancer and ovarian cancer, once a tissue sample is 
obtained from a subject, this tissue sample can be compared to a library of tissue samples 
from many subjects, representing various stages of the cancerous tumor. By comparing the 
tissue sample to a library of tissue samples with known transposable element chromatin 

25 status pattems, one of skill in the art can tailor treatment to the individual needs of the 

subject. For example, if the chromatin status patten for the subject matches the chromatin 
status pattem of a particular stage of cancer that is amenable to treatment with a 
chemotherapeutic agent, then the subject is a candidate for that treatment. Similarly, one of 
skill in the art can determine the likelihood that the subject will respond to a particular 

30 treatment by determining whether or not the subject's pattem corresponds to pattems 

obtained for fliose who have responded to treatment. In tibis way, treatments can be 

personalized to maximize the outcome while minimizing uimecessary side effects. The 

pattems in the libraries utilized for comparisoii purposes can be groined by age, medical 

history or other categories in order to better determine the likelihood of response for 
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subjects. In certain cases, the pattern obtained from the subject may correspond to a pattem 
for a stage of cancer that does not respond to any available treatment. In cases, such as 
these, one of skill in the art may determine that treatment may not be advisable because the 
subject may suffer unnecessarily with little or no likelihood of success. 
S In some instances, effective treatments may involve decreasing the chromatin 

accessibility of certain transposable elements and increasing the chromatin accessibility of 
others. Therefore, once libraries of chromatin status patterns are established from untreated 
and treated cancer subjects, one of skill in the art will know whether or not treatment is 
effective in a particular subject by comparing the chromatin status pattem of a sample from 

10 the patient at different stages of treatment, with reference pattems established for the 

successful treatment of that particular type of cancer. If a treatment is not successful in a 
particular subject, the skilled artisan will recognize this by noting that the chromatin status 
pattem is not changing as expected, i.e., the chromatin status pattem is not changing such 
that the chromatin status pattem more closely resembles the chromatin status pattem of a 

1 5 non-cancerous or successfully treated cancer cell, and other dosages, therapies or treatments 
can be employed. 

Therefore, the present invention also provides a method of determining the 
effectiveness of an anti-cancer therapeutic in a subject comprising: a) determining the 
chromatin status pattem of one or more families of transposable elements, in a sample 

20 obtained from the subject, to obtain a first chromatin status pattem; b) administering an anti- 
cancer therapeutic to the subject; c) determining the chromatin status pattem of one or more 
families of transposable elements in a sample obtained from the subject after administration 
of an anti-cancer therapeutic to obtain a second chromatin status pattem; and d) comparing 
the second chromatin status pattem with the first chromatin status pattem such that if the 

25 differences between the chromatin status patterns can be correlated with successful 

treatment, the anti-cancer therapeutic is an effective anti-cancer therapeutic. The changes 
observed between chromatin status pattems can vary depending on the type of cancer and 
the stage of canc^. The changes in chromatin status pattems can also vary based on the 
size, age, weight and other physiological characteristics of the subject. 

30 In some instances, an effective anti-cancer therapeutic will result in fewer 

transposable elements being accessible to restriction enzymes in the second chromatin status 
pattem as compared to the first chromatin status pattem. In oth^ instances, there may be 
more transposable elements accessible to restriction enzymes in the second pattem as 
compared to the first chromatin status pattem. For example, one of skill in the art can 
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diagnose a cancer utiliziiig the methods of the present invention and assign a first chromatin 
status pattern to a sample firom a subject. The following example is hot meant to be limiting 
and the numb^ing of transposable elements appears for illustrative purposes only and not 
for purposes of identifying any particular transposable element sequences. As an example, 

S this first chromatin status pattern comprises the chromatin status of transposable elements 2 
(accessible), 4 (not accessible), 6 (accessible), 8 (not accessible) and 10 (not accessible) 
fix3m transposable element family A, the chromatin status of transposable elements 24 (not 
accessible), 57 (accessible) and 79 (not accessible) firom transposable element family B and 
the chromatin status of transposable elements 11 (not accessible), 16 (accessible), and 26 

10 (not accessible) firom transposable element family C. After administration of an anti-cancer 
therapeutic, a second.chromatin status pattem is obtained. The second chromatin status 
pattem comprises, for example, the chromatin status of transposable elements 2 (not 
accessible), 4 (not accessible), 6 (accessible), 8 (not accessible) and 10 (not accessible) from 
family A, the chromatin status of transposable element 24 (not accessible), 57 (not 

1 5 accessible) and 79 (accessible) fix>m family B and the chromatin status of transposable 
elements 11 (accessible), 16 (not accessible) and 26 (not accessible) firom transposable 
element family C. The skilled artisan, upon comparing the patterns, will determine that the 
anti-cancer tiierapeutic results in changes in the chromatin status of transposable element 2 
from family A, transposable elements 57 and 79 from family B, and transposable element 

20 11 from transposable element family C. This second chromatin status pattem can be 
compared to the chromatin status pattOTi of a normal cell to see if the treatment is 
progressing toward a chromatin status pattem associated with a non-cancerous cell. This 
second chromatin status pattem can also be compared to chromatin status pattems for 
different stages of the particular cancer being treated in order to determine if this pattem 

25 corresponds to an improvement or a deterioration in the subject's condition. The skilled 
artisan can continue to monitor changes throughout treatment in order to determine which 
transposable elements are accessible or not accessible and whether or not an improvement 
can be correlated to changes in chromatin status, as treatment progresses. . 

As stated above, the chromatin status state of non-cancerous cells can serve as a 

30 guide to one of skill in the art in determining the effectiveness of a treatment One of skill 

in the art can compare the chromatin status pattem obtained after treatment to the chromatin 

status pattem of a normal, non-cancerous cell to determine how the treatment is progressing. 

If the chromatin status pattem after treatment resembles the chromatin status pattem of a 

normal cell, the treatment can be said to be successfiil, however, the chromatin status 
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pattern need not be exactly like the chromatin status pattern of a normal cell in order to 
deem a treatment effective. In other words, if flie chaages in transposable element sequence 
chromatin status after treatment are indicative of progression toward the chromatin status 
pattern of a normal cell, the treatment can be said to be successful. 

The chromatin status patterns of the present invention can be correlated to 
transposable elemrat expression patterns and/or methylation patterns described herein, such 
that one of skill in the art, upon obtaining a particular e>qpression pattern and/or methylation 
pattern, will also know what the chromatin status of the sample is. Also, upon obtaining a 
particular chromatin status pattern, one of skill in the art will also know the expression 
pattern and/or methylation pattem of the sample. 

The methods of the present invention can also be utilized to differentiate between 
subtypes of cancers. For example, mantle cell lymphoma and grades J/TL follicular 
lymphoma are subtypes of non-Hodgkin^s lymphoma. Similarly, adenocarcinoma, large 
cell carcinoma, spindle cell carciiiorna, squamous cell carcinoma, adenosquamous 
carcinoma and small cell carcinoma are all subtypes of lung cancer. Numerous subtypes for 
other cancers are also known and they can be differentiated by the methods of the present 
invention. By utilizing the expression patterns, chromatin status patterns and/or methylation 
patterns of cells associated with these subtypes, die skilled artisan can make a more accurate 
diagnosis of a particular type of cancer. The differences in the expression patterns, 
chromatin status and methylation patterns of the transposable element sequences allows the 
skilled artisan to differentiate between subtypes and tiius better stage the cancer as well as 
administer treatment best suited for a specific cancer subtype. 

The present invention also provides a computer system comprising a) a database 
including records comprising a plurality of reference retroelement expression patterns, and 
associated diagnosis and therapy data; and b) a user interface capable of receiving a 
selection of one or more test retroelement expression patterns for use in determining 
matches between a test retroelement expression pattem and a reference retroelement 
expression pattern, and displaying the records associated witii matching expression patterns. 
The computer systems of the present invention can also include a database including records 
comprising a plurality of reference methylation patterns, and associated diagnosis and 
therapy data, b) a user interface capable of receiving a selection of one or more test 
methylation patterns for use in determining matches between a test methylation pattem and 
the reference methylation pattem, and displaying the records associated with matching 
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expression patterns. Also provided is a computer system comprising a) a database including 
records comprising a plurality of reference chromatin status patterns, and associated 
diagnosis and therapy data; and b) a user interface capable of receiving a selection of one or 
more test chromatin status patterns for use in determining matches between a test chromatin 
5 status pattem and a reference chromatin status pattern, and displaying the records associated 
with matching expression patterns. 

It will be appreciated by those skilled in the art that expression patterns, methylation 
patterns and/or chromatin status patterns identified from subjects can be stored, recorded, 
and manipulated on any medium which can be read and accessed by a computer. As used 

10 herein, the words ''recorded*' and "stored" refer to a process for storing information on a 
computer medium. A skilled artisan can readily adopt any of the presently known methods 
for recording information on a computer readable medium to generate a list of sequences 
comprising one or more of the nucleic acids of the inv^tion. Another aspect of the present 
invention is a computer readable medium having recorded thereon at least 2, 5, 10, 15, 20, 

15 25, 30, 50, 100, 200, 250, 300, 400, 500, 1000, 2000, 3000, 4000 or 5000 expression 

patterns, methylation patterns and/or chromatin status patterns of the invention or patterns 
identified from subjects. 

Computer readable media include magnetically readable media, optically readable 
media, electronically readable media and magnetic/optical media. For example, the 
20 computer readable media may be a hard disc, a floppy disc, a magnetic tape, CD-ROM, 
DVD, RAM, or ROM as well as other types of other . media known to those skilled in the 
art. 

Embodiments of the present invention include systCTOS, particularly computer 
systems which contain the sequence information described herein. As used h^ein, "a 
25 computer system'* refers to the hardware components, software components, and data 
storage components used to store and/or analyze the expression pattans of the present 
invention or other expression patterns. The computer system preferably includes the 
computer readable media described above, and a processor for accessing and manipulating 
the data. 

30 Preferably, the computer is a general purpose system that comprises a central 

processing unit (CPU), one or more data storage components for storing data, and one or 
more data retrieving devices for retrieving the data stored on the data storage components. 
A skilled artisan can readily appreciate that any one of the currentiy available computer 
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systems are suitable. 

In one particular embodiment, the computer system includes a processor connected 
to a bus which is connected to a main memory, preferably implCTiented as RAM, and one or 
more data storage devices, such as a hard drive and/or other computCT readable media 

5 having data recorded thereon. In some embodiments, the computer system furttier includes 
one or more data retrieving devices for reading the data stored on the data stprage 
components. The data retrievmg device may represent, for exanq)le, a floppy disk drive, a 
compact disk drive, a magnetic t£^e drive, a hard disk drive, a CD-ROM drive, a DVD 
drive, etc. In some embodiments, the data storage component is a removable computer 

10 readable medium such as a floppy disk, a compact disk, a magnetic t^e, etc. containing 
control logic and/or data recorded thereon. The computer system may advantageously 
include or be programmed by appropriate software for reading the control logic and/or the 
data from the data storage component once inserted in the data retrieving device. 

In some embodiments, the computer system may further comprise an expression 
15 pattern comparer for comparing.the expression pattem(s) stored on a computer readable 
medium to expression pattem(s) stored on a computer readable medium. An "e3q)ression 
pattern comparer^' refers to one or more programs which are implemented on tiie computer 
system to compare a nucleotide sequence with other nucleotide sequences. Similarly, 
programs capable of comparing methylation status patterns and chromatin status patterns 
20 are also contemplated by the present invention. 

This invration also provides for a computer program that correlates expression 
patterns with a particular stage of cancer. Similarly, the present invention also provides a 
compiuter program that correlates methylation patterns with a particular stage of cancer. 
Also provided is a computer program that correlates chromatin status with a particular stage 

25 of cancer. The computer programs of this invention can optionally include treatment 

options or drug indications for subjects with expression patterns associated with cancer or 
the risk of developing cancer. 

The present invention is more particularly described in the following examples 
which are intended as illustrative only since niunerous modifications and variations therein 

30 will be apparent to those skilled in the art. 
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Expression chances 

Semi-quantitative RT-PCR was performed to quantify changes in expression from 
different HERV families, as well as LINEs and SESTEs, amongst a small set of malignant, 
benign, and borderline tumors and non-cancerous ovarian tissue samples. Figure 1 shows 
tiie upregulation of HERV-K and HERV-W femilies in a cancer sample, compared with a 
non-cancer sample. 

Methvlation status 

Methylation levels of HERV-W, and LI were compared among different ovarian 
samples. Ten micrograms of genomic DNA were digested eiflier with a methylation 
sensitive restriction enzyme (HpalT) or with its mettiylation insensitive isoschizomer (Mspl). 
These enzymes recognize the palindromic sequence CCGG, which is found in diverse 
positions in the promoter regions of these retroelements. Digestion is carried out overnight 
at ST'C with 10 tol6 excess of needed enzyme to ensure complete digestion of the DNA. A 
control for DNase contamination is mcluded by incubating the same amoimt of DNA with 
buffer and water without the enzyme. Digested DNA is run on an agarose gel and 
transferred to a nylon membrane with NaOH. Membranes are then prehybridized for 1 hour 
with 10 mg of herring sperm DNA per every milliliter of Church buffer, and hybridized 
overnight at 65**C with probes for HERV-K, HERV-W or LI respectively. 

Probe design was based on the hypothesis that relevant DNA methylation changes, if 
any, would include the predicted promoter regions of retrotransposons. 

Figure 2 shows the results obtained after using a probe for the promoter region of 
HERV-W. After digestion with Afep/ different bands with approximately the same sizes are 
observed in cancer, benign, borderUne (LMP) and non-cancerous (Non-Cr) samples. After 
digestion with the methylation sensitive restriction enzyme Hpall^ the bands are weaker but 
still present in most of the cancer samples, while most of the bands, and specially the 
smaller ones, are absent in the benign, borderline and non-cancerous samples. This result 
mdicates that some methylation has been lost in the cancer samples. 

Southern blot analvsis. LINEl probe 

Figure 3 shows a Southern blot analysis of genomic DNA after digest with Mspl 

QfA) or its methylation-sensitive isoschizomer Hpall (H), resp., hybridized with a LINEl 
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probe spanning the putative promoter region of the element. Equal amounts of DNA were 
loaded per sanople, i.e. ^er Mspl/Hpall pair. Fragment sizes range from 0.1 kb to >3.0 kb. 
Samples represent ovarian carcinoma (T - malignant), borderline ovarian tumor (B) and 
non- tumor ovarian tissue (N), 

5 Fragments between 1.4-2kb as well as.0.4-0.7kb (arrows) in Hpall digests appear 

more pronounced in the malignant tissue samples compared to the non-tumor samples, 
indicating extensive cytosine meihylation of this particular LINEl region in non-carcinoma 
ovarian tissue and loss of LINEl methylation in some ovarian carcinoma samples. 

Southern Blot images are consistent with hypomethylation of Herv-W and LINEl 

10 elements, respectively, in ovarian carcinoma versus normal ovarian tissue. The changes are 
more pronounced for Herv-W and more consistent among carcinoma samples. There is 
some heterogeneity for the effect among the samples tested, which will be correlated with 
clinical history of the tumors and treatment responses. 

EXAMPLE n 

1 5 Wide-spread hypomethylation of CpG dinucleotides is characteristic of many 

cancers. Retrotransposons have been identified as potential targets of hypomethylation 
during cellular transformation. The following example provides the results of an 
examination of the methylation status of CpG dinucleotides associated with the LI and 
HERV-W retrotransposons in benign and malignant human ovarian tumors. A reduction in 

20 the methylation of CpG dinucleotides was found within the promoter regions of these 

retroelements in maUgnant relative to non-malignant ovarian tissues. Consistent with tiiese 
results, it was also found that relative LI and human endogenous retrovirus-W (HERV-W) 
expression levels are elevated in representative samples of maUgnant vs. non-malignant 
ovarian tissues. 

25 The results of a preliminary examination of the methylation status of CpG 

dinucleotides associated with two representative families of retrotransposons in benign and 
malignant human ovarian tumors is provided herein. LI is the most abundant family of 
human LINE elements comprising about 17% of the genome [22]. Human Endogenous 
Retrovirus-W (HERV-W) is a family LTR retrotransposons consisting of -140 fiiU-lengfli 

30 or tnmcated elements randomly dispersed throughout the human genome [23]. These results 

demonstrate that large numbers of both femilies of retrotransposons are hypomethylated in 

ovarian carcinomas. It is further demonstrated that relative levels of both LI and HERV-W 

expression are elevated in representative samples of malignant vs. non-malignant ovarian 

tissues. The findings presented herein are consistent with the hypothesis that 
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retrotransposons are a major target of global hypomethylation associated with cellular 
transformation. 

To test die hypothesis that LI and HERV-W elements may experience reduced 
methylation in malignant ovarian carcinomas, a restriction-enzyme based assay was utilized 
S to compare the methylation status of CpG dinucleotides located within the promoter regions 
of tiiese elements in a series of malignant and non-malignant ovarian tissues. The restriction 
enzymes Mspl and Hpallboih recognize the sequence CCGG but Hpall only cuts when the 
recognition sequence is unmethylated at the inner cytosine (i.e., CCGG) while Mspl is 
indifferent to the methylation status of the inner cytosine 

1 0 Figure 4A & B displays Southem blots of Hpal and Mspl digested genomic DNA 

isolated from tissue samples and hybridized against probes homologous to regions 
encompassing the promoter regions of each family of elements. The /i^^oZT/Myp/ restriction 
sites located within the promoter regions of both LI and HERV-W elements are 
polymorphic among fanuly members. By aligning the promoter regions of both families of 

1 S elements present in the consensus human genome [http://genome.ucsc.edu/] and identifying 
the HpaU/MspI sites present, it was estimated that the expected size range of restriction 
fragments within tiie elraients to be between -100 - 700 bp and -1500 - 3000 bp for LI . 
elements and between -100 - SOO bp for HERV-W elements. Larger sized fragments 
representing partial digestions and/or polymorphic Hpall/Mspl sites located within the 

20 elements or in regions flanking the elements are also visible. 

The results presented in Figure 4 A & B show that Msp/-generated bands within the 
expected size range of internal fragments were visible in digestions of DNA from all tissue 
samples. In contrast, //palT-generated fragments within ttie expected size range were only 
visible in digestions of DNA from the malignant samples. These results are indicative of a 

25 consistent reduction in the methylation of CpG dinucleotides within the promoter regions of 
both LI and HERV-W elements in the maUgnant tissue. The fact that the number and 
intensity of Hpall generated bands in the malignant samples is significantly less than 
generated by Mspl digestion indicates that some LI and HERV-W elements remain 
hypermethlyated in the malignant samples. Regardless, this is the first report of the 

30 hypomethylation of LI elCTi^ts in ovarian carcinomas and of the hypomethylation of 

HERV-W in any huntian cancer. 

As noted above, hypomethylation of retroelement promoter regions can be expected 

to result in a localized relaxation of chromatin stmcture and a corresponding increased 

element expression [e.g., 10]. In order to test this prediction in these samples, total RNA 
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was extracted from representative samples of two malignant and two non-malignant ovarian 
tissues and quantitative Real Time RT-PCR was conducted. Two replicate assays were run 
for each tissue sample. The results shown in Figure 4C indicate a significant average 
increase in both LI and HERV-W expression in the malignant vs. non-malignant ovarian 

S tissues examined. . 

Hypomefhylation is generally associated with tiie relaxation of chromatin structure, 
an increased accessibility of transcription factors and a consequent elevation in levels of 
expression [27]. These findings are generally consistent with these prior results. Since 
transcription is a rate limiting step in retrotransposition [11], hypomethylation might be 

1 0 expected to result in an mcrease in retrotransposon insertion mutations. While there have 
been occasional reports of LI and other retrotransposon insertion mutations implicated in 
cancer development in humans [e.g, 28], this may not be as significant a factor as it 
apparently is in the mouse [29], perhaps because most LI and other retrotransposon 
sequences in the human genome are believed to be truncated or otherwise transpositionally 

15 defective [30]. 

Another possible consequence of tiie hypomethylation of retroelements in humans is 
the opportunity it provides for ectopic pairing and recombination among homologous 
elements dispersed throughout the genome. The unequal-crossover events typically 
associated with ectopic recombination might well account for at least some of the various 

20 chromosomal aberrations and aneuploid events characteristic of human malignancies. 

Indeed, direct evidence of such an effect has recently been documented in mice [31, 32]. In 
humans, LI retrotransposition events have been shown to induce various forms of 
chromosomal instabilities [33] and LI and other retrotransposon sequences have frequently 
been linked with a variety of chromosomal aberrations associated with human cancers [e.g, 

25 34], 

A third possible consequence of the hypomethylation of retroelements in cancer 
cells is the potential regulatory impact of the release of methylation complexes known to be 
bound to these elements in post-embryonic somatic cells [e.g, 35]. Although little is 
currently understood concerning the factors ttiat determine the relative affinity of 
30 methylation complexes for DNA target sequences, retrotransposons are known to be high 
afBnity targets [e.g, 10]. Complexes released from retroelements may initiate a cascade of 
regulatory changes by binding to other lower afBnity target sites and possibly resulting in 
the down regulation of genes essential for DNA repair and genome stability. 
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Tissue samples, DNA extraction^ Sonthern hvbridizatioii 

Bulk ovarian tissue samples were surgically removed and placed in RNA later 
(Ambion, Austin, TX) in the operating room within 1 minute of removal from the patients. 
The pathological and clinical information of each sample is as foUows: Sample #1 1 (Age 
5 43), Adenocarcinoma (papillary serous, poorly differentiated. Stage He); Sample #18 (Age 
34), Adenocarcinoma (endometroid, well differentiated. Stage lib); Sample #19 (Age 57), 
Adenocarcinoma (papillary serous, poorly differentiated. Stage lie); Sample #21 (Age 80), 
Malignant mixed mullerian; Sample #23 (Age 52), Adenocarcinoma (papillary serous, 
poorly differentiated, Stage Ha); Sample #29 (Age 66), Adenocarcinoma (papillary serous, 

10 poorly differentiated. Stage HI); Sample #15 (Age 54), Serous borderline /low-malignancy 
potencial; Sample #31 (Age 40), Benign cystic masses; Sample #16 (Age 53), Normal 
ovary; Sample #89 (Age 53), Normal ovary. . This study was approved by the Institutional 
Review Board of the University of Georgia and of Northside Hospital (Atianta), from which 
the samples were obtained. 

15 Genomic DNA was extracted by proteinase K digestion of 20-25 mg of bulk ovarian 

tissue and phenol-chlorophorm extraction. DNA was ethanol precipitated and re-suspended 
in water. Ten micrograms of genomic DNA were digested overnight at 37^C with 10 to 1 6 
excess amount of either HpaH [methylation sensitive restriction enzyme] or Mspl [not 
sensitive for methylation at internal cytosine]. These enzymes recognize the sequence 

20 CCGG, which is found in diverse positions in the promoter regions of these retroelements. 
Digested DNA was resolved on an agarose gel and transferred to a nylon membrane 
(Hybond N; Amersham-Biosciences, Piscataway, NJ) with NaOH. Membranes were 
prehybridized for 1 hour with 10 mg/ml of herring sperm DNA in Chxurch buffer [0.5M 
NaH2P04, 7% SDS and lOM EDTA] and hybridized ovemight at 65°C in the same buffer 

25 with 100-200ng of probe DNA labeled with [a-^^P]dCTP using a Nick Translation Kit 
(Roche, Mdian£q)olis, IN).. Filters were washed twice for 15 min in 2xSSC and 0.1% SDS 
and then twice for 30 min in Ix SSC and 0.1% SDS at 65''C and exposed to 
Phosphorimager screens (Molecular Dynamics, Sumiyvale, CA). 

The HERV-W probe was designed in the .LTR region, downstream of the putative 

30 TTAAAT box. PGR was performed on genomic DNA with forward primer HHRVF 5 
CCACCACTGCTGTTTGCCAC-3' (SEQ ID NO: 771) and reverse primer HERVR 5'- 
GCCTCGTGTTCTCTGACCTGGGG-3' (SEQ ID NO; 772), producing a 304 bp fragment. 
The LINEl probe for the promoter region was designed according to Takai et al [18]. PGR 
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was performed on genomic DNA with forward primer LIF 5 *- 
CGGGTGATTTCTGCATTTCC-3' (SEQ ID NO: 773) and reverse primer LIR 5'- 
GACATTTAAGTCTGCAGAGG-3' (SEQ ID NO: 774), giving a product of 540 bp. PGR 
products were cloned into pCR2.1-TOPO and transformed into TOPIO E.coli cells 
5 (Invitrogen, Carlsbad, OA). Plasmids were extracted (Qiaprep Spin Miniprep Kit, Qiagen, 
Valencia, OA) and sequenced. Subsequent PGR reactions were performed oh cloned 
plasmid DNA for both HERV-W and LINEl, and gel extracted PGR products were used as 
hybridization probes. 

10 RNA extraction. Quantitative real time RT-PCR 

Total RNA was extracted using Trizol Reagent (Invitrogen, Carlsbad, CA) and 2-5 
lig of total RNA were reverse transcribed into first-strand cDNA using the Thermoscript 
RT-PCR system (Mvitrogen, Carlsbad, CA) in a final volume of 20 ^1. The HERV-W 
primers used were: forward; 5'-TTGGCGGTATCACAACCTCT-3' (SEQ ID NO: 775) 

15 reverse; 5'-GTGAGGATTGGGGATTGA-3' (SEQ ID NO: 776); (product size:230 bp) 
based on flie HERV-W sequence (GeneBank accession no. AG000064). The LINE-1 
primers were: forward 5'-TCATAAAGCAAGTCCTCAGTGAGC-3' (SEQ ID NO: 777); 
reverse 5'-GGGGTGGAGAGTTCTGTAGATGTC-3' (SEQ ID NO: 778) (product size:165 
bp) based on the LINE-1 sequence (GeneBank accession no. M80343). Real-time 

20 monitoring of PGR reactions was performed using the DNA Engine Opticon 2 System (M J 
Research, Walfliam, MA) and the SYBR Green iQ dye (BioRad, Hercules, CA) [24]. For 
each reaction, the amount of a target and of an endogenous control ^bosomal Protein 
S27A) were determined using a calibration curve and the amount of target molecule was 
divided by the amount of endogenous reference to obtain a normaUzed target value [25]. 

25 RPS27 A has been previously identified as a valid control gene in expression studies 
conducted among hmnan malignant and control tissues [26]. In addition, microarray 
analyses were utilized to indenpendently verify that.RPS27A expression levels are constant 
among the samples examined in this study. Separate calibration (standard) curves for 
RPS27A, HERV-W and LINE-1 were constrocted using serial dilutions of total cDNA fix>m 

30 normal human ovarian tissue (purchased from Ambion, Austin, TX). Standards for HERV- 
W, LINB-1 and RPS27A were defined to contain an arbitrary starting concentration, and 
serial dilutions were used to construct the standard curve. Standard curve calibrations were 
included in each assay. 
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Microarrav Analysis of Cancer Cells 

Table 2 shows a ranking of relative retroelement expression values comparing 
benign (control) vs. malignant (cancw) samples obtaining via microarray analysis on a gene 
chip (Figure 5). The results of this e3q)eriment show that some retroelement familes show a 

5 significant increase in expression in cancer (Stage HI ovarian carcinoma) vs. controls 
(negative values in Comparison Rank column), some show no net change (values in 
Comparison Rank column around 0) and some show a decrease in net levels (positive value 
in Comparison Rank colimm). The changes in expression can be due to changes in 
chromatin structure. Thus, this data set shows that there is a heterogeneous response in 

10 changes in chromatin structure in stage IH tumors. This example utilizing stage m tumor 
samples is not limited to a particular stage of type of cancer and is merely illustrative of the 
kind of changes in retroelement expression that can be analyzed by the methods of the 
present invention in order to diagnose, stage and treat any type of cancer. 

Throughout this appUcation, various pubUcations are referenced. The disclosures of 

1 5 these publications in their entireties are hereby incorporated by reference into this 
^pUcation in order to more fiiUy describe the state of the art to which this invention 
pertains. 

It will be apparent to those skilled in the art that various modifications and variations 
can be made in the present iuvention without departing firom the scope or spirit of the 
20 invCTition. Other embodiments of the invention will be apparent to those skilled in the art 
fix)m consideration of the specification and practice offhe invention disclosed herein. It is 
intended that the specification and examples be considered as exemplary only, with a true 
scope and spirit of the invention beiag indicated by the following claims. 
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Genename 




B77log 


B53iog 


C141log 


C154log 


Comparison Ran 


L1ME1 


LINE1.ME1 subfemily 


1.35077862 


1.78180622 


1.69332148 


1.64623708 


-0.306105083 


ALU^C 


SINE element 


0.68972892 


0.9183396 


0.80555819 


0.87181976 


-0.166204761 


LTR5_C 


long terminal repeat 


1.94516871 


1.56669724 


1.03574106 


1.95720687 


0.282267811 


L1MA4A 


LINE1 , MA4A subfamily 


1.55470712 


2.1847541 


1.72191098 


1.71634687 


0.335*083736 


HERVL74 


Human endogenous retrovirus, subfamily L 


2.1348742 


1.70081483 


1.97225587 


6.94321787 


0.444734906 


L1MD1_5_B 


UNE1,MD1 subfamily 


1.72196204 


2.2003511 


1.81762843 


1.58184923 


0.517665856 


M1R3_C 


SINE element 


2.1814338 


1.89379992 


1.94937867 


1.54700864 


0.593194055 


L1MB3_5 


L1NE1.MB3 subfemily 


2.2090425 


1.633133 


1.65469321 


1.42120887 


0.669435686 


L1PREC2_C 


UNE1. PREC2 subfamily 


2.55292039 


2.16451509 


2.15268908 


1.39347057 


0.721679935 


HERV17_C 


Human endogenous retrovirus, subfamily 17 


2.96503482 


1.86327413 


1.81145688 


1.18631188 


0.749541436 


TIGGER2_C 


DMA transposon 


2.36529271 


1.63334668 


1.52355074 


1.33672167 


0.876108867 


2APHOD 


DNA transposon 


2.1513326 


1.7663077 


1.64906155 


1.3920269 


0.965355576 


SVA_C 


SINE-R (non retroviral retrotransposon) 


2.2227769 


1.89286675 


1.73386684 


1.30913517 


1.005075735 


HERVE_C 


Human endogenous retrovirus, subfamily E 


2.45155247 


1.77868979 


1.61843377 


1.53897952 


1.008357796 


LTR68 


long terminal repeat 


2.34333093 


2.07355412 


1.93739866 


1.63957228 


1.04634535 


CHARLIE3_C 


DNA transposon 


2.35703636 


1.70038524 


1.48926233 


1.37092819 


1.092369458 


L1PA2_C 


LINE1.PA2 subfamily 


2.16239562 


2.31209291 


1.97830497 


1.45958445 


1.096598938 


THE1A_C 


MalR-mammalian LTR retrotransposon 


2.00541667 


1.93515248 


1,74245596 


1.15032661 


1.118514825 


HERVK^C 


Human endogenous retrovirus, subfamily K 


2.0061171 


2.15653499 


1.82253452 


1.40105752 


1.161079999 


L1_C 


LINE1 


2.49301356 


2.34060322 


2.02819922 


1.25668997 


1.185293378 


L3_G 


LINES 


2.35638086 


2.00908158 


1.74395501 


1.54420679 


1.392505357 


MLT2A1_C 


MalR-mammallan LTR retrotransposon 


2.40138399 


2.03382426 


1.77178165 


1.60782029 


1.404321263 


L1MC3_G 


LINE1,MC3 subfamily 


2.40070124 


2.12369076 


1.75851006 


1.38915384 


1-506101383 


HAL1B 


non-autonomous derivative of LINE1 


2.24611928 


2.11701552 


1.76240173 


1.29920584 


1.553805998 


LTR17JC 


temninal repeat 


1.83016919 


1.99673012 


1.70364718 


1.66104849 


1.562573711 


MER74C 


MalR-mammalian LTR retrotransposon 


2.10832145 


2.03572708 


1.61778714 


1.04521613 


1.623238292 


L1PA7_C 


LINE1.PA7 subfamily 


2.36314897 


2.35395921 


1.96388533 


1.42191829 


1.707997573 


LTR6A 


long tenninal repeat 


1.86476687 


2.15684185 


1.54696871 


1.4465473 


1.852173244 


MER119 


non-autonomous retroelement 


2.08618876 


1.8328609 


1.65129333 


1.51283891 


2.071811546 


HERVL^C 


Human endogenous retrovirus, subfamily L 


2.39027926 


2.12124503 


1.74133356 


1.64196556 


2.165501757 


TIGGER1_C 


DNA transposon 


2.07714571 


2,0604822 


1.80109953 


1.57511768 


2.218870626 


MIR_C 


mammalian-wide interspersed repeat 


2.1449389 


2.2361877 


1.82011015 


1.62411927 


2.3063887 


THE1BR_C 


MalR-mammalian LTR retrotransposon 


2.0698519 


2.07895536 


1.72412613 


1.67293527 


8.816162784 



Ranlcing,of genes as computed by .the noise to signal ratio derived from mean expression levels at ttiree positions 

derived from mean expression levels at three positions 

on a log2 scale: Differential expression between cancer and benign 

and benign 



• TABLE 2 
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